Software Engineering Architect - Platform
We are seeking a highly seasoned and expert Software Engineering Architect to lead the design and scaling of one of the world's largest Kubernetes deployments. This critical role involves architecting a robust, secure, and highly reliable container platform that powers thousands of microservices and other services across diverse environments.
The ideal candidate possesses a profound technical understanding of distributed systems, container orchestration, and infrastructure development, coupled with a passion for designing platforms that are easy for other software engineers to build, test, and operate on. You will work on real-world, massive scale problems, collaborate with top-tier engineers, and directly influence the strategic direction of our core container platform across multiple substrates.
Key Responsibilities:
- Platform Strategy & Design: Lead the architectural design and evolution of our large-scale, enterprise-grade Kubernetes platform to ensure it meets requirements for scalability, reliability, security, and performance.
- Software Development Lifecycle (SDLC) Integration: Define and implement platform tooling and APIs to optimize the SDLC for thousands of microservices, with a focus on application development and deployment pipelines.
- Scale and Performance: Architect solutions to handle massive, ever-increasing service and infrastructure scale, ensuring high availability and low latency across the deployment, paying close attention to performance tuning.
- Technical Leadership: Act as a subject matter expert and technical leader, guiding platform implementation teams and ensuring alignment with best practices in platform and software engineering.
- Microservices Architecture: Define and evangelize resilient software design patterns and best practices for building, deploying, and managing thousands of microservices on the container platform.
- Cross-Functional Partnership: Partner closely with infrastructure, security, and application development teams to integrate platform components seamlessly and define clear interfaces for engineering efficiency.
- System Reliability: Design systems that are inherently resilient, self-healing, and easy to monitor and troubleshoot, driving down operational complexity for our application engineers.
- Build and ship high-quality, production-grade software using modern engineering practices, with AI as a core part of your development workflow by pushing the boundaries of AI development tools to deliver secure, optimized, and high-quality code.
- Design and orchestrate complex systems where AI agents integrate seamlessly into human workflows, driving efficiency and innovation at scale.
- Contribute to building and maintaining the shared system context, an explicit repository of system designs, constraints, and standards that enables AI to operate accurately and reliably.
- Critically evaluate code (human or AI-generated) for correctness, quality, security, and performance.
Essential Qualifications
- Experience: 15+ years of progressive experience in hands-on software engineering and/or platform engineering, with a significant focus on building and scaling complex, high-volume distributed systems.
- Deep Kubernetes Expertise:
- Expert-level understanding of Kubernetes internals, architecture, networking, security, and operation at extreme scale.
- Proven experience in designing and scaling Kubernetes deployments supporting thousands of services
- Programming Skills: Deep proficiency in Golang (Go) for developing and extending infrastructure systems, APIs, and platform tooling (required for infrastructure development).
- Infrastructure Systems: Extensive background in infrastructure development, including cloud environments, networking, storage, and infrastructure-as-code principles.
- Microservices: Expert knowledge of microservices architecture, service mesh technologies, API design principles, and inter-service communication patterns.
- Security & Reliability: A strong track record of designing platforms that prioritize security, observability (logging, metrics, tracing), and operational reliability for both the platform and the applications it hosts.
Why Join Us?
- Real Scale: Work on platform challenges that few organizations ever encounter, powering mission-critical software and services globally.
- Influence: Directly shape the future and direction of our core container platform strategy.
- Talent: Collaborate daily with some of the industry's most talented and passionate software and platform engineers.
- A demonstrated, genuine AI-first approach to engineering — using AI to move faster, build fluency across the stack, and contribute well beyond your core specialty.
- Experience using AI tools (e.g., Claude Code, GitHub Copilot, Codex, Cursor, etc.) in development workflows.
- Advanced prompt engineering skills and the ability to write precise, structured prompts and cultivate the system context that makes AI outputs reliable, secure, and production-ready.
*LI-Y