This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Lead Data Engineer with AI experience based in India.
This role sits at the core of modern AI and data transformation initiatives, building the foundational infrastructure that powers next-generation intelligent systems. You will design and operate scalable data pipelines, retrieval systems, and ML/LLMOps frameworks that enable advanced AI applications, including conversational agents, RAG systems, and predictive models. The work spans both classical data engineering and cutting-edge AI infrastructure, requiring strong architectural thinking and hands-on execution.
You will collaborate with cross-functional engineering and AI teams to translate reference architectures into production-grade systems that are reliable, scalable, and efficient.
Your contributions will directly influence the performance, accuracy, and scalability of AI-driven products used in real-world enterprise environments.
The role offers exposure to agentic systems, semantic data layers, and advanced retrieval architectures at scale.
It is a highly technical and impact-driven position where engineering excellence and AI innovation intersect.
Accountabilities:
- Data Pipeline Engineering: Build, optimize, and maintain robust batch and streaming data pipelines using modern cloud-native tools such as Snowflake, PySpark, Delta Lake, and Kafka, ensuring reliability, scalability, and performance.
- RAG & Retrieval Infrastructure: Design and implement end-to-end retrieval systems including embedding pipelines, vector databases, hybrid search, chunking strategies, and ranking mechanisms to optimize AI context relevance.
- Semantic & Knowledge Layer Development: Develop ontologies, entity mappings, and knowledge graphs while maintaining semantic contracts, metadata systems, and lineage tracking for AI and ML use cases.
- ML/LLMOps Enablement: Support ML and LLM lifecycle workflows including dataset curation, feature engineering, model evaluation, experiment tracking, and production monitoring.
- Agentic Data Systems: Build APIs, context stores, and tool interfaces that enable autonomous agents, including observability for reasoning traces, tool calls, and contextual outputs.
- Governance & Data Quality: Implement robust data governance frameworks including RBAC, PII handling, schema validation, data quality monitoring, and compliance-ready audit logging systems.
Requirements
This role requires a highly experienced data engineering professional with strong cloud, distributed systems, and AI infrastructure expertise. The ideal candidate combines deep technical execution with architectural thinking and hands-on experience building production-grade AI-enabled data systems.
- 7+ years of experience in data engineering with strong exposure to cloud-based data platforms.
- 2+ years of experience building production AI/ML or LLM-related data infrastructure at scale.
- Strong expertise in Python, SQL, PySpark, Snowflake, Delta Lake, Kafka, and Spark Structured Streaming.
- Hands-on experience with vector databases, embedding pipelines, and retrieval systems in production RAG environments.
- Solid understanding of MLOps practices including MLflow, CI/CD for ML systems, and automated evaluation frameworks.
- Strong knowledge of data governance, security, compliance, and data quality frameworks.
- Experience working with cloud ecosystems such as AWS or Azure and containerized environments (Docker, Kubernetes).
- Familiarity with AI/LLM tooling such as LangChain, LlamaIndex, OpenAI/Claude/Bedrock APIs, and FastAPI is a plus.
- Strong problem-solving mindset with the ability to design scalable systems and operate in fast-moving AI environments.
Benefits
- Competitive compensation package aligned with experience and market standards
- Remote-friendly or hybrid work flexibility depending on team structure
- Opportunity to work on cutting-edge AI, LLM, and agentic systems
- Exposure to global engineering teams and enterprise-scale AI transformation projects
- Health, insurance, and wellness benefits (as per policy and location)
- Learning and development support for advanced AI and data engineering skills
- Access to modern cloud-native and AI-first technology stacks
- Collaborative, engineering-driven culture focused on innovation and impact.