Veeam is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale. As the market leader in both data resilience and data security posture management, Veeam is built for the convergence of identity, data, security, and AI risk. Headquartered in Seattle with offices in more than 30 countries, Veeam protects over 550,000 customers worldwide, who trust Veeam to keep their businesses running. Join us as we go fearlessly forward together, growing, learning, and making a real impact for some of the world’s biggest brands.
As a Production Engineer, you will play a key role in supporting reliable, scalable systems for Veeam's Data Cloud platform. You will own production efficiency, automation and documentation projects, contribute to reliability and observability improvements, and own or participate in the full incident lifecycle — from on-call response, through mitigation, to leading post-incident reviews and driving improvements across support and development teams.
You will work as part of a team of skilled engineers, collaborating with support and development as a bridge and driving force for change. You will communicate with product managers and security professionals to ensure our services are production-ready, performant, and fault-tolerant, and that we rapidly incorporate user feedback into improvements
Own complex and escalated production issues from support, and drive long-term fixes in collaboration with engineering, including code, configuration, and architecture changes.
Proactively identify and address risks that are identified during the problem solving process
Lead production efficiency initiatives, develop and maintain processes, run-books and knowledge base integrity
Define, build and maintain production monitoring systems
Continuously improve alerting to minimize noise and ensure actionable, well-documented runbooks.
Define and maintain SLIs/SLOs for key services, and use error budgets to guide operational and product decisions.
Turn manual processes into automation
Own and drive post-mortem review process and actions arising from incident analysis.
Collaborate with support organization as an escalation point and feed back knowledge & improvement recommendations.
Collaborate with developers throughout the lifecycle of changes, from design through rollout and patch delivery, ensuring safe deployments and efficient incident mitigation.
Participate in design reviews to ensure services are operable with minimal manual intervention in production (automation, safe deployments, clear runbooks), and share learnings through documentation and feedback.
3–5 years of experience in software engineering, site reliability, production engineering, or senior technical support roles operating distributed systems.
Experience with log analysis and advanced troubleshooting
Basic programming experience (e.g., JS, Go, Typescript, Java, or C#).
Experience deploying and troubleshooting systems on a public cloud platforms (Azure preferred).
Familiarity with observability tooling (e.g., Elastic, Prometheus, Grafana, Open Telemetry).
Understanding of distributed systems, networking, automation and CI/CD.
Prior on-call or incident response experience.
Background in automation, performance testing, or service scalability.
Familiarity with compliance or security best practices.
Make a high-impact contribution to the architecture and reliability of Veeam's first global SaaS product suite.
Help shape a modern SRE organization from the ground up, influencing best practices, tooling, and culture.
Collaborate with highly skilled teams across product, cloud engineering, security, and support.
Access professional development resources including internal mentorship, technical training platforms, and volunteer days.
Enjoy competitive compensation and benefits tailored to local markets in the US, Czechia, India, and Australia.
Join us and help define the future of cloud-native data protection.
What you'll get
Compensation Transparency
Veeam is committed to pay transparency and equitable compensation. For this role, the compensation range below reflects the expected total target compensation (TTC), inclusive of base pay and a competitive performance-based bonus. For roles with a commission plan, the compensation range represents On Target Earnings (OTE), which includes base salary plus variable commission. When determining compensation, Veeam takes into consideration factors such as experience, education, skills, and geographic zone. Offers are typically made below the midpoint of the range.
In addition to compensation, Veeam provides a comprehensive benefits package, including health coverage, retirement plans, and unlimited time off.
Veeam Software is an equal opportunity employer and does not tolerate discrimination in any form on the basis of race, color, religion, gender, age, national origin, citizenship, disability, veteran status or any other classification protected by federal, state or local law. All your information will be kept confidential.
Personal data collected during the recruitment process will be processed in accordance with our Recruiting Privacy Notice, which explains how your information is collected, used, and handled in connection with hiring activities. By applying for this position, you consent to this processing.
By submitting your application, you confirm that the information provided, including any supporting documents, is complete and accurate to the best of your knowledge. Any misrepresentation, omission, or falsification may result in disqualification from consideration or, if discovered after employment begins, termination of employment.