AI Engineer (Managed Services)

AvePoint · Singapore

We are looking for a highly skilled AI Engineer specializing in Large Language Models (LLMs) and Agentic AI. You will architect, build, and deploy production-grade LLM applications — from intelligent knowledge bases and RAG systems to autonomous multi-agent workflows. You will work hands-on with open-source Chinese and international LLMs (DeepSeek, Qwen, Kimi, etc), implementing everything from model deployment and inference optimization to prompt engineering and agent orchestration. This is a builder role for someone who thrives at the intersection of research and engineering.

KEY RESPONSIBILITIES

LLM Application Development

Design and develop enterprise LLM-powered applications: intelligent Q&A systems, enterprise knowledge base assistants, AI copilots, document analysis tools, and automated customer service agents.
Architect and implement end-to-end RAG (Retrieval-Augmented Generation) systems: document parsing and chunking (recursive, semantic, agentic), embedding generation (BGE, M3E, GTE), vector retrieval (dense + sparse hybrid search), reranking (bge-reranker, Cohere Rerank), and response synthesis with source attribution.
Develop and optimize Prompt Engineering strategies: chain-of-thought, tree-of-thought, few-shot prompting, structured output parsing (JSON mode / Pydantic), prompt templates (LangChain/LangSmith), and prompt version management.
Knowledge in harness engineering, context management in ensuring LLM interactions and or AI agents reliable and deterministic.

AI Agent & Multi-Agent Systems

Design and build AI Agent systems using ReAct, Plan-and-Execute, Reflection, and multi-agent collaboration patterns.
Implement Function Calling and tool-use capabilities, enabling agents to interact with external APIs, databases, and enterprise systems.
Develop multi-agent orchestration using LangGraph, AutoGen, CrewAI, and other agent frameworks to solve complex enterprise tasks through agent collaboration.
Design MCP (Model Context Protocol) integrations for standardized LLM tool interoperability.

Open-Source LLM Deployment & Optimization

Deploy and optimize latest version of open-source Chinese LLMs: DeepSeek, Qwen, and Kimi for on-premise and private cloud environments.
Implement model inference optimization: quantization (GGUF/llama.cpp, GPTQ, AWQ, AutoAWQ, FP8/INT8), KV Cache optimization, continuous batching (vLLM, TensorRT-LLM, TGI, SGLang), speculative decoding, and tensor parallelism for high-throughput serving.
Build and maintain model serving infrastructure using vLLM, TensorRT-LLM, Text Generation Inference (TGI), Ollama, Xinference, and SGLang; configure GPU resource scheduling with Kubernetes + GPU operators. AI gateway tools for routing, model tracking and load balancing such as TrueFoundry, Kubeflow, LiteLLM or Ray for heavy deep learning.

Model Fine-Tuning & Customization

Implement efficient fine-tuning pipelines using LoRA, QLoRA, DoRA, and full-parameter fine-tuning on proprietary domain-specific datasets.
Prepare and curate instruction-following datasets, RLHF/RLAIF datasets, and evaluation benchmarks for domain adaptation.
Evaluate fine-tuned models using automated benchmarks and LLM-as-a-Judge methodologies.

Evaluation & Production Operations

Build and maintain LLM evaluation frameworks: LLM-as-a-Judge, RAGAS, DeepEval, ARES, and custom task-specific metrics for continuous quality monitoring.
Implement production monitoring for LLM systems: output quality tracking, latency/throughput metrics, cost monitoring, drift detection, and guardrail compliance.
Design A/B testing frameworks for model comparison and prompt iteration.
Implement LLM security guardrails: input/output filtering, PII detection, prompt injection defense, content moderation, and safety alignment.

Research & Technical Leadership

Track frontier AI research and evaluate emerging technologies (new model architectures, training techniques, inference methods) for enterprise adoption.
Contribute to internal knowledge sharing: tech talks, documentation, and best-practice guides on LLM development.

REQUIRED QUALIFICATIONS

Bachelor's degree or above in Computer Science, Artificial Intelligence, Machine Learning, or related technical field. Master's or PhD in AI/ML preferred.
2+ years of professional experience in AI/ML engineering with demonstrated production deployment of LLM-based systems at scale.
Deep understanding of Transformer architecture, attention mechanisms (MHA, GQA, MQA), and LLM pre-training / fine-tuning / inference paradigms.
Expert proficiency in LLM application frameworks: LangChain, LlamaIndex, Haystack, or equivalent production-grade tools.
Hands-on experience with RAG system development: vector databases (Milvus, ChromaDB, Qdrant, Weaviate, Pinecone, pgvector), embedding models (BGE, M3E, GTE, OpenAI, Cohere), reranking (bge-reranker, Cohere Rerank, cross-encoders), and advanced retrieval techniques (hybrid search, query expansion, HyDE).
Practical experience deploying and tuning open-source Chinese LLMs: DeepSeek, Qwen, Kimi , or international models (Llama 3.x, Mistral, Mixtral, Gemma, Phi).
Strong experience with model deployment and serving infrastructure: vLLM, TensorRT-LLM, TGI, Ollama, Xinference, SGLang; GPU resource scheduling (Kubernetes + GPU operators).
Proficiency in model quantization and inference optimization: GGUF (llama.cpp), GPTQ, AWQ, AutoAWQ, FP8/INT8; knowledge of KV Cache optimization and memory-efficient attention (FlashAttention, FlashInfer, PageAttention).
Solid programming skills in Python; experience with PyTorch, TensorFlow, or JAX; familiarity with FastAPI/Flask for building LLM API services.
Experience with LLM evaluation methodologies, A/B testing frameworks, and production monitoring of AI systems.

PREFERRED QUALIFICATIONS

Experience with agent frameworks: LangGraph, AutoGen, CrewAI, OpenAI Assistants API, and multi-agent orchestration patterns.
Familiarity with MCP (Model Context Protocol), OpenAI API specification, and multi-modal LLM capabilities (vision, audio).
Experience with prompt optimization tools: DSPy, PromptLayer, LangSmith for systematic prompt engineering.
Knowledge of model distillation and efficient transfer learning from large teacher models to smaller student models.
Contributions to open-source AI projects or publications in NLP/LLM research venues.
Experience with cloud GPU providers and cost optimization for LLM inference at scale.
Access to high-performance GPU computing resources for model development and experimentation.

Any personal data you share with us during the application process will be processed strictly in compliance with applicable data protection laws and our Privacy Notice.

AI Engineering pay context

Based on 592 disclosed AI Engineering salaries on RoleSuite, the role pays a median of $200K/year, with most offers between $164K and $236K (10th–90th percentile: $131K–$272K).

See the full AI Engineering salary breakdown →

Apply →

AI Engineer (Managed Services)

AI Engineering pay context

Other roles at AvePoint

More AI Engineering roles