This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior Cloud Database Engineer based in India.
This is a senior, hands-on engineering role focused on ensuring the reliability, scalability, and operational excellence of large-scale cloud database systems powering mission-critical SaaS platforms. You will take ownership of production MySQL environments in AWS, working at the intersection of infrastructure, software engineering, and operations. The role involves solving complex production incidents, improving database performance, and building automation that reduces manual operational work. You will collaborate closely with software engineers to improve system observability, stability, and deployment safety. This position is ideal for someone who enjoys deep technical problem-solving, thrives in high-responsibility environments, and is motivated by improving how large-scale database systems operate. You will also play a key role in shaping more efficient workflows, better tooling, and stronger operational practices across the engineering organization.
Accountabilities:
- Own the health, availability, and day-to-day operations of production MySQL databases running on AWS RDS, ensuring high reliability for global SaaS systems.
- Lead incident response efforts, including troubleshooting, root cause analysis, service restoration, and long-term prevention of recurring issues.
- Design and implement automation to eliminate repetitive operational tasks and reduce manual intervention, risk, and engineering overhead.
- Improve database workflows, including change management, data fixes, and operational processes to ensure scalability, safety, and auditability.
- Collaborate with software engineering teams to investigate production issues, analyze logs, and improve system behavior and performance.
- Optimize database performance through query tuning, indexing strategies, capacity planning, and proactive monitoring.
- Contribute to high availability, disaster recovery, and resilience strategies for cloud database infrastructure.
- Participate in on-call rotations and provide timely support for critical production incidents.
Requirements:
- 5–8 years of hands-on experience managing production databases in AWS cloud environments.
- Strong expertise in Amazon RDS and solid production experience with MySQL database administration and optimization.
- Experience working in cloud-native environments, with strong understanding of managed database services.
- Proficiency in automation and scripting (e.g., Python, SQL or similar), with a focus on building and maintaining operational tooling.
- Proven experience in production operations, incident management, and on-call support for business-critical systems.
- Strong database performance engineering skills, including query optimization, indexing, and root cause analysis.
- Good understanding of AWS networking fundamentals and infrastructure concepts relevant to database troubleshooting.
- Strong collaboration skills with software engineering teams and familiarity with modern software development lifecycles.
- Excellent problem-solving ability, ownership mindset, and strong communication skills.
- Ability to learn quickly, work independently, and continuously improve systems and processes.
- Nice to have: Experience with Amazon Aurora, MariaDB, Linux environments, Datadog, AWS Performance Insights, or prior software engineering experience.
Benefits:
- Remote-first role based in India, with preference for Bangalore.
- Opportunity to work on large-scale, mission-critical cloud database systems in a high-growth environment.
- Exposure to modern AWS database technologies and evolving cloud-native architectures.
- High-impact role with ownership over production reliability and operational excellence.
- Collaborative engineering culture working closely with experienced software and infrastructure teams.
- Strong focus on automation, tooling, and continuous improvement of engineering processes.
- Opportunity to shape database operations and scalability practices across a global platform.