Software Engineer II
At H1, we believe access to the best healthcare information is a basic human right. Our mission is to provide a platform that can optimally inform every doctor interaction globally. This promotes health equity and builds needed trust in healthcare systems. To accomplish this our teams harness the power of data and AI-technology to unlock groundbreaking medical insights and convert those insights into action that result in optimal patient outcomes and accelerates an equitable and inclusive drug development lifecycle. Visit h1.co to learn more about us.
Data Engineering at H1 is responsible for the development and delivery of our most important asset, our data. With thousands of data sources from around the world, the team ensures that data is accurate, normalized, and delivered at a velocity that keeps up with real-world changes. As we expand our markets and the scope of data we provide to our customers, our team must scale to meet that demand.
WHAT YOU'LL DO AT H1
We are hiring a Backend Software Engineer II (Data Harvesting) to help build and scale the systems that power how we collect and process data from the web. This role is ideal for an engineer who has hands-on experience building data pipelines and working with web data, and is looking to grow into deeper ownership of distributed systems and data platforms. You will work closely with senior engineers and cross-functional partners to design, build, and improve systems that capture, process, and deliver high-quality data at scale.
You will:
- Contribute to building systems and frameworks that capture web data at scale, including working with structured and unstructured data sources
- Design and develop data extraction components using tools such as APIs, scraping frameworks, and parsing logic
- Build and maintain ETL/ELT pipelines using technologies like Apache Spark and cloud platforms (preferably AWS)
- Write clean, efficient Python code to support data ingestion, transformation, and processing workflows
- Help improve the reliability and performance of data pipelines through monitoring, debugging, and optimization
- Work with senior engineers to enhance systems that handle: data quality and normalization, large-scale data ingestion and pipeline scalability
- Troubleshoot issues related to: data inconsistencies, pipeline failures and source data changes (e.g., website structure updates)
- Collaborate with product, data, and engineering teams to ensure data is usable and aligned with business needs
- Contribute to documentation and participate in code reviews to support engineering best practices
ABOUT YOU
- You are an early-to-mid level engineer who has built data pipelines or backend systems and is eager to deepen your expertise in large-scale data systems and web data processing.
-You enjoy solving complex data problems and working with real-world, messy datasets
-You enjoy solving complex data problems and working with real-world, messy datasets
- You are comfortable writing production-level code and debugging systems
- You are collaborative and open to feedback, with a strong desire to learn from senior engineers
- You have a strong foundation in data structures, system design fundamentals, and backend development
- You are interested in working on systems that interact with external data sources (e.g., APIs, web data)
REQUIREMENTS
- 3–5 years of professional experience in backend or data engineering
- Strong proficiency in Python
- Experience working with large-scale data ingestion systems
- Experience building and maintaining data pipelines or backend services
- Familiarity with web data extraction concepts, such as: APIs, web scraping (Selenium, Playwright, or similar) and handling structured and unstructured data
- Strong SQL skills (PostgreSQL or similar databases)
- Experience with Apache Spark or similar data processing frameworks
- Experience working in a AWS cloud environment
- Familiarity with Docker or containerization
Nice to Have:
- Exposure to web scraping at scale, including challenges like rate limiting or dynamic content
- Familiarity with Airflow, Argo or orchestration tools
- Basic understanding of HTTP/HTTPS and web protocols
- Exposure to LLMs or NLP-based data extraction workflows
Working Hours
- This role is fully integrated with our global team and requires daily collaboration with US-based engineers.- Working hours: 1:00 PM – 9:00 PM IST (Monday–Friday)
- This ensures strong overlap with our US teams and real-time collaboration
Not meeting all the requirements but still feel like you’d be a great fit? Tell us how you can contribute to our team in a cover letter!