On behalf of Ivix, SD Solutions is looking for a talented Backend Developer (Crawler) to join a talented team.
SD Solutions is a staffing company operating globally. Contact us to get more details about the benefits we offer.
Responsibilities:
- Do research using a state-of-the-art technology stack.
- Extract required information from web pages.
Requirements:
- Python Programming: 3+ years of using Python. Familiarity with its syntax, data structures (like lists, dictionaries, sets), and control flow mechanisms (if statements, loops).
- Data Storage: Knowledge in databases, docker and cloud infrastructure. With basic web crawling understanding. Familiarity with data storage solutions, whether it’s flat files (like CSV, JSON)
- Understanding of HTML and CSS: Understanding HTML (the standard markup language for creating web pages) and CSS (Cascading Style Sheets used for styling).
- Web Scraping Libraries: Familiarity with Python libraries used for web scraping, such as requests/scrapy (for making HTTP requests) and BeautifulSoup or lxml (for parsing HTML and XML documents).
- Knowledge of APIs: Understanding how to work with APIs (Application Programming Interfaces), some websites offer APIs for accessing their data in a structured format, which can be a more reliable method of data retrieval than parsing HTML.
- Basic Understanding of HTTP/HTTPS: This includes understanding request methods (GET, POST), status codes (200 OK, 404 Not Found), and the concept of headers, cookies and proxies.
- Familiarity with Regex (Regular Expressions): Regex can be incredibly useful for extracting specific patterns of text from web pages, although it requires careful use to avoid complex and brittle patterns.
- Rate Limiting and Handling Pagination: Knowing how to manage the rate of your requests to avoid overwhelming the server is important, as is dealing with pagination to efficiently navigate through multiple pages of content.
- Basic knowledge of using the Postman tool.
- Strong understanding of browser developer tools for Chrome and Firefox, specifically the network monitor.
Advantages:
- Experience/knowledge of web networking.
- Experience/knowledge of HTTP crawling.
- Experience working with databases or cloud storage services.
- Skills in debugging Python code and testing.
About the company:
Powered by artificial intelligence and machine learning, IVIX gathers and enriches publicly available business activity data to accurately identify businesses, their revenue, and the taxpayer entity.
By applying for this position, you agree to the terms outlined in our Privacy Policy. Please take a moment to review our Privacy Policy https://sd-solutions.breezy.hr/privacy-notice, and make sure you understand its contents. If you have any questions or concerns regarding our Privacy Policy, please feel free to contact us.