Job Overview
As a Machine Learning Engineer, AI & ML — Data Collection, you will contribute to building and scaling the company’s Unified AI/ML Data Collection Platform, enabling standardized, reliable, and scalable machine learning capabilities across the organization. This role will focus on developing and supporting AI/ML and LLM-driven data systems that power data pipelines, model lifecycle management, evaluation frameworks, and production deployment.
This position requires hands-on experience in machine learning engineering, LLM-based systems, ML platform development, and MLOps. You will work closely with ML engineers, product managers, researchers, and business stakeholders to deliver production-ready AI/ML capabilities aligned with broader business objectives and AI/ML strategy.
You will be involved in the design, development, testing, deployment, and support of platform components, including data ingestion, feature management, model training and evaluation, scalable inference systems, and model observability capabilities.
You will help build AI/ML systems that are production-ready, observable, maintainable, and cost-efficient, with an emphasis on reliability, performance, governance, and developer productivity. You will work with technologies and patterns related to large language models (LLM), retrieval-augmented generation (RAG), embeddings, vector databases, distributed systems, cloud-native architectures, and ML Operations (MLOps).
You will contribute to the end-to-end lifecycle of ML systems, from experimentation and prototyping to deployment, monitoring, optimization, and continuous improvement, while working with peers and contributing to strong engineering practices across the team.
Team Overview
You will be part of a multidisciplinary team of ML engineers responsible for building and maintaining the Unified AI/ML Data Collection Platform. The team focuses on developing scalable systems that support data pipelines, model lifecycle management, LLM-based workflows, and evaluation frameworks, enabling downstream teams to build and deploy AI-driven data collection solutions.
Outline of Duties and Responsibilities
- AI-Powered Data Collection Systems: Develop and support scalable AI-driven data collection and enrichment workflows across structured and unstructured data sources.
- LLM & Generative AI Workflows: Build and maintain LLM-based capabilities including RAG systems, prompt orchestration, entity extraction, summarization, classification, and automated validation workflows.
- Agentic Frameworks & Model Context Integration: Contribute to agentic workflows and model-to-tool integrations that connect AI models with internal tools, APIs, knowledge stores, data sources, and workflow systems.
- Model Deployment & Lifecycle Management: Support deployment, maintenance, and optimization of ML and LLM models in production, including model versioning, CI/CD, experiment tracking, model registry, rollout strategies, and rollback mechanisms.
- Data Quality & Evaluation: Implement evaluation frameworks for extraction quality, model performance, hallucination risks, grounding, consistency, latency, coverage, and overall data reliability.
- Observability & Operational Excellence: Build and maintain monitoring, logging, tracing, alerting, cost tracking, model performance monitoring, drift detection, and reliability dashboards for production AI/ML systems.
- Scalable Platform Engineering: Develop distributed, event-driven, and cloud-native systems using asynchronous processing, message queues, containerization, and orchestration patterns to support high-volume workloads.
- Innovation & Continuous Improvement: Evaluate and apply emerging AI/ML technologies, LLM frameworks, orchestration tools, vector databases, and model deployment approaches to improve automation capabilities and developer productivity.
- Company Values: Model company values and contribute to a culture of innovation, accountability, collaboration, inclusion, and continuous improvement.
Experience, Skills, and Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Science, Mathematics, or a related technical field.
- 3+ years of experience in machine learning engineering, data science, software engineering, ML platforms, or distributed systems.
- Experience building, deploying, or maintaining production ML systems, including model deployment, inference services, or lifecycle management.
- Hands-on experience with MLOps tools and practices, including CI/CD, model monitoring, experiment tracking, automated testing, or deployment automation.
- Strong programming skills in Python and SQL, or similar languages.
- Experience with cloud platforms and containerization technologies such as AWS, GCP, Azure, Docker, or Kubernetes.
- Experience with LLM-based systems or related capabilities, including RAG pipelines, embeddings, vector databases, prompt orchestration, or model evaluation.
- Understanding of distributed systems, scalability, data pipelines, and system design trade-offs.
- Ability to solve technical challenges and deliver reliable, maintainable, and scalable solutions.
- Strong communication and collaboration skills, with experience working across product, engineering, data, or business teams.
- Experience working in fast-paced, data-driven environments.
Working Conditions
The job conditions for this position are in a standard office setting. Employees in this position use PC and phones on an ongoing basis throughout the day. Limited corporate travel may be required to remote offices or other business meetings and events.
Morningstar's hybrid work environment gives you the opportunity to collaborate in-person each week as we've found that we're at our best when we're purposely together on a regular basis. In most of our locations, our hybrid work model is four days in-office each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.
I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity