Software Engineering Architect - Platform

Salesforce · San Francisco, California

We are seeking a highly seasoned and expert Software Engineering Architect to lead the design and scaling of one of the world's largest Kubernetes deployments. This critical role involves architecting a robust, secure, and highly reliable container platform that powers thousands of microservices and other services across diverse environments.

The ideal candidate possesses a profound technical understanding of distributed systems, container orchestration, and infrastructure development, coupled with a passion for designing platforms that are easy for other software engineers to build, test, and operate on. You will work on real-world, massive scale problems, collaborate with top-tier engineers, and directly influence the strategic direction of our core container platform across multiple substrates.

Key Responsibilities:

  • Platform Strategy & Design: Lead the architectural design and evolution of our large-scale, enterprise-grade Kubernetes platform to ensure it meets requirements for scalability, reliability, security, and performance.
  • Software Development Lifecycle (SDLC) Integration: Define and implement platform tooling and APIs to optimize the SDLC for thousands of microservices, with a focus on application development and deployment pipelines.
  • Scale and Performance: Architect solutions to handle massive, ever-increasing service and infrastructure scale, ensuring high availability and low latency across the deployment, paying close attention to performance tuning.
  • Technical Leadership: Act as a subject matter expert and technical leader, guiding platform implementation teams and ensuring alignment with best practices in platform and software engineering.
  • Microservices Architecture: Define and evangelize resilient software design patterns and best practices for building, deploying, and managing thousands of microservices on the container platform.
  • Cross-Functional Partnership: Partner closely with infrastructure, security, and application development teams to integrate platform components seamlessly and define clear interfaces for engineering efficiency.
  • System Reliability: Design systems that are inherently resilient, self-healing, and easy to monitor and troubleshoot, driving down operational complexity for our application engineers.
  • Build and ship high-quality, production-grade software using modern engineering practices, with AI as a core part of your development workflow by pushing the boundaries of AI development tools to deliver secure, optimized, and high-quality code.
  • Design and orchestrate complex systems where AI agents integrate seamlessly into human workflows, driving efficiency and innovation at scale.
  • Contribute to building and maintaining the shared system context, an explicit repository of system designs, constraints, and standards that enables AI to operate accurately and reliably.
  • Critically evaluate code (human or AI-generated) for correctness, quality, security, and performance.

Essential Qualifications

  • Experience: 15+ years of progressive experience in hands-on software engineering and/or platform engineering, with a significant focus on building and scaling complex, high-volume distributed systems.
  • Deep Kubernetes Expertise:
    • Expert-level understanding of Kubernetes internals, architecture, networking, security, and operation at extreme scale.
    • Proven experience in designing and scaling Kubernetes deployments supporting thousands of services
  • Programming Skills: Deep proficiency in Golang (Go) for developing and extending infrastructure systems, APIs, and platform tooling (required for infrastructure development).
  • Infrastructure Systems: Extensive background in infrastructure development, including cloud environments, networking, storage, and infrastructure-as-code principles.
  • Microservices: Expert knowledge of microservices architecture, service mesh technologies, API design principles, and inter-service communication patterns.
  • Security & Reliability: A strong track record of designing platforms that prioritize security, observability (logging, metrics, tracing), and operational reliability for both the platform and the applications it hosts.

Why Join Us?

  • Real Scale: Work on platform challenges that few organizations ever encounter, powering mission-critical software and services globally.
  • Influence: Directly shape the future and direction of our core container platform strategy.
  • Talent: Collaborate daily with some of the industry's most talented and passionate software and platform engineers.
  • A demonstrated, genuine AI-first approach to engineering — using AI to move faster, build fluency across the stack, and contribute well beyond your core specialty.
  • Experience using AI tools (e.g., Claude Code, GitHub Copilot, Codex, Cursor, etc.) in development workflows.
  • Advanced prompt engineering skills and the ability to write precise, structured prompts and cultivate the system context that makes AI outputs reliable, secure, and production-ready.

*LI-Y

Apply →