This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Senior / Staff DevOps Engineer based in the United States.
This is a high-impact opportunity for an experienced DevOps professional to shape the reliability, scalability, and security of a mission-critical technology platform used by organizations worldwide. Working in a fully remote environment, you will lead infrastructure strategy, automation initiatives, and operational excellence efforts while partnering closely with engineering, product, and security teams. The role combines hands-on technical leadership with strategic influence, enabling you to drive cloud architecture decisions, modernize deployment workflows, and strengthen compliance and security practices. You will play a key role in building resilient systems, improving developer productivity, and leveraging AI-powered tools to accelerate infrastructure and platform engineering. Ideal candidates thrive in fast-moving environments, embrace ownership, and are passionate about creating secure, scalable systems that support continuous growth and innovation.
Accountabilities
- Design, build, and maintain secure, scalable, and cost-efficient cloud infrastructure with a strong focus on automation, reliability, and operational excellence.
- Lead the development and continuous improvement of infrastructure-as-code frameworks, deployment pipelines, and platform automation capabilities.
- Own and optimize CI/CD processes, ensuring rapid, reliable, and secure software delivery across the development lifecycle.
- Establish and enhance observability practices through monitoring, logging, tracing, alerting, and dashboarding solutions that proactively identify and resolve issues.
- Drive infrastructure security initiatives, including identity and access management, encryption, secrets management, network security, and vulnerability remediation.
- Partner with cross-functional teams to maintain compliance with industry standards and regulatory frameworks while automating controls and audit readiness processes.
- Lead incident response efforts, participate in on-call operations, facilitate post-incident reviews, and implement long-term improvements that enhance system reliability.
- Define service reliability objectives, support capacity planning, and perform performance optimization to ensure platform scalability and availability.
- Leverage AI-powered engineering tools to accelerate infrastructure development, automate operational workflows, improve troubleshooting, and enhance team productivity.
- Mentor engineers, establish platform engineering best practices, and contribute to a culture of continuous improvement, shared accountability, and technical excellence.
Requirements
- 6+ years of experience in DevOps, Site Reliability Engineering, Platform Engineering, or Infrastructure Engineering within a software development environment.
- Deep expertise with major cloud platforms, preferably Google Cloud Platform (GCP), along with strong knowledge of networking, distributed systems, and cloud architecture.
- Extensive experience with infrastructure-as-code tools such as Terraform, Pulumi, CloudFormation, or similar technologies.
- Strong hands-on experience with containerization and orchestration technologies including Docker, Kubernetes, ECS, or equivalent platforms.
- Proven ability to build, manage, and optimize CI/CD pipelines using tools such as GitHub Actions, CircleCI, or comparable solutions.
- Experience implementing and maintaining observability platforms, monitoring frameworks, and operational dashboards.
- Strong scripting and programming skills in languages such as Python, Go, Bash, TypeScript, or similar.
- Advanced knowledge of cloud security best practices, including IAM, least-privilege access controls, encryption, secrets management, and vulnerability management.
- Hands-on experience supporting compliance frameworks such as SOC 2, ISO 27001, GDPR, HIPAA, or related standards.
- Demonstrated proficiency using AI-powered development and productivity tools, including GitHub Copilot, Cursor, Claude Code, or equivalent technologies.
- Experience leading production incident response, participating in on-call rotations, and driving operational improvements based on postmortem analysis.
- Excellent communication, documentation, and collaboration skills, with the ability to work effectively across distributed teams.
- Prior experience working within startup environments and rapidly evolving organizations is required.
- Familiarity with identity verification technologies, AI-driven platforms, or machine learning-based products is considered a plus.
Benefits
- Competitive base salary ranging from $200,000 to $225,000 USD annually, depending on experience and level.
- Equity compensation as part of the overall rewards package.
- Fully remote work environment with flexibility to work from anywhere within the United States.
- Self-managed paid time off policy supporting work-life balance.
- 11+ paid company holidays annually.
- 401(k) retirement savings plan.
- Comprehensive medical, dental, and vision insurance coverage.
- Employee Assistance Program (EAP) and additional wellness resources.
- Access to wellness and healthcare programs, including virtual health services and preventative care support.
- Parental leave benefits.
- Inclusive and collaborative culture focused on innovation, diversity, and professional growth.
- Opportunity to work on cutting-edge infrastructure, security, and AI-enabled engineering initiatives within a rapidly growing technology environment.