Senior AI Data Engineer
This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior AI Data Engineer based in the United States.
This is a high-impact, AI-first engineering role focused on building and operating the data infrastructure that powers large-scale public data aggregation and insight generation. You will own end-to-end systems spanning data acquisition, transformation, serving, and reporting, with a strong emphasis on automation, resilience, and self-healing pipelines. Rather than manually maintaining brittle scrapers, you will design intelligent systems that leverage LLMs and agentic workflows to detect, diagnose, and repair failures autonomously. The role combines deep data engineering ownership with modern AI-native development practices, including LLM-driven parsing, anomaly detection, and natural language data interfaces. You will also contribute to building scalable reporting layers and production-grade data services that power real-time insights. Working closely with senior engineering and product leadership, you will help shape a system where AI and data infrastructure operate seamlessly together in production.
Accountabilities
- Own the end-to-end design, development, and reliability of large-scale data acquisition systems, including web scraping infrastructure and automated data pipelines.
- Build and maintain self-healing scraper systems that use LLMs and agentic workflows to detect, diagnose, and automatically recover from failures.
- Ensure daily data ingestion pipelines remain stable through monitoring, alerting, retry logic, and robust failure handling mechanisms.
- Develop AI-assisted parsing and entity extraction systems to handle unstructured or frequently changing web data.
- Own the data serving layer and ETL/ELT pipelines powering analytics and BigQuery-based data warehouses.
- Design and implement reporting systems, including data models, transformations, dashboards, and AI-driven narrative insights.
- Apply rule-based and ML/LLM-based techniques for data quality monitoring, anomaly detection, and system reliability.
- Build and maintain production-grade MCP servers and agentic workflows for internal and AI-driven data consumption.
- Collaborate with engineering, product, and leadership teams to define system architecture and ensure long-term maintainability.
- Document systems, best practices, and operational workflows to support scalable human-in-the-loop AI operations.
- 6+ years of experience in data engineering with ownership of production-grade, mission-critical systems.
- Strong proficiency in Python with hands-on experience building and maintaining large-scale web scraping systems (Scrapy, Playwright, Selenium, BeautifulSoup).
- Proven experience designing and deploying LLM-powered or agentic systems in production environments.
- Strong understanding of prompt engineering, LLM evaluation, observability, and AI system performance trade-offs (latency, cost, quality, reliability).
- Experience building data modeling, transformation pipelines (e.g., dbt), and BI/reporting layers.
- Strong expertise in SQL and hands-on experience with the GCP ecosystem (BigQuery, Cloud Composer, Cloud Storage, Cloud Run/GKE).
- Familiarity with Docker and production system design for scalable data infrastructure.
- Strong reliability mindset with proven ownership of uptime, incident response, and production system stability.
- Understanding of legal and ethical considerations in large-scale web scraping and data acquisition.
- Experience working with AI-assisted development tools (e.g., Claude, Cursor) is highly desirable.
- Bonus: experience with ML model deployment, distributed systems, Terraform, Pub/Sub, or large-scale data processing frameworks.
- Remote position based in the United States.
- Competitive compensation package with base salary and bonus ($190K–$210K base range, depending on experience).
- Full benefits package including medical, dental, vision, life, and disability insurance.
- 401(k) retirement plan and paid time off.
- Opportunity to work on cutting-edge AI-first data systems at scale.
- High ownership role with direct impact on production infrastructure and product outcomes.
- Exposure to modern LLM-driven engineering practices and agentic system design.
- Fast-paced, high-growth environment combining startup innovation with enterprise stability.
Requirements
Benefits
Data & ML pay context
Based on 1,537 disclosed Data & ML salaries on RoleSuite, the role pays a median of $162K/year, with most offers between $127K and $202K (10th–90th percentile: $102K–$246K).
This posting lists $190K–$210K, above the $162K market median.
See the full Data & ML salary breakdown →