DevOpsJobs
RoleSuite
CompaniesRemoteAboutMethodologyContactPrivacy
Updated 2026-06-27 20:00 UTC·© 2025–2026 RoleSuite
← Back to listings

Site Reliability Engineer - Insurance Platform (Remote, China)

Bjakcareer · China

BJAK’s automation systems power end-to-end insurance journeys across quote generation, policy issuance, renewals, endorsements, claims, payments and insurer integrations. These systems are business-critical, where uptime, reliability and performance directly impact customers and operations.

We're looking for a Site Reliability Engineer based in China to ensure the stability, scalability and resilience of BJAK’s insurance automation platform, bridging software engineering and infrastructure operations to keep systems running reliably at scale.

This is a fully remote position where you'll collaborate closely with our Malaysia-based engineering, product and operations teams to operate and improve production systems.

The Mission

Ensure BJAK’s insurance automation platform is reliable, scalable and observable by building strong operational systems, improving incident response and driving engineering practices that prevent failures before they happen.

What You’ll Own

  • Own reliability and operational stability of BJAK’s production systems.

  • Design and improve monitoring, alerting, logging and observability across services.

  • Lead incident response, troubleshooting and structured root cause analysis.

  • Improve system resilience through redundancy, failover and recovery strategies.

  • Work with engineers to design systems that are reliable, scalable and operable in production.

  • Improve deployment safety through CI/CD pipelines, release strategies and automation.

  • Reduce recurring incidents by identifying root causes and driving long-term fixes.

  • Manage and optimize cloud infrastructure supporting business-critical workflows.

  • Strengthen operational practices including on-call processes, incident playbooks and SLAs.

  • Continuously improve system uptime, performance and operational maturity.

What We're Looking For

  • Experience in Site Reliability Engineering, DevOps, platform engineering or infrastructure roles.

  • Strong understanding of distributed systems, cloud infrastructure and production operations.

  • Experience with monitoring, alerting and observability tools.

  • Strong troubleshooting skills for production incidents and system failures.

  • Ability to design for reliability, scalability and fault tolerance.

  • Experience working with CI/CD pipelines and deployment automation.

  • Strong understanding of system performance, capacity planning and risk management.

  • Hands-on ownership mindset during incidents and operational issues.

  • Calm, structured and disciplined approach to production environments.

  • Strong collaboration with engineering teams in fast-paced environments.

Bonus Points

  • Experience with AWS, GCP, Azure or similar cloud platforms.

  • Experience with Kubernetes, Docker or container orchestration systems.

  • Experience with infrastructure-as-code tools (Terraform, Ansible, etc.).

  • Experience with observability stacks (Prometheus, Grafana, ELK, Datadog, etc.).

  • Experience with incident management tools and on-call systems.

  • Experience with zero-downtime deployments and progressive delivery strategies.

  • Experience working in fintech, insurance or regulated industries.

  • Experience building reliability frameworks or SRE best practices in scaling systems.

  • Contributions to platform reliability or infrastructure resilience initiatives.

The Kind of Builder We Want

  • Calm and structured under pressure, especially during production incidents.

  • Hands-on engineer who understands both code and infrastructure deeply.

  • Thinks in failure modes, system risks and recovery strategies.

  • Strong focus on reliability, observability and long-term system health.

  • Proactive in preventing incidents, not just responding to them.

  • Careful and deliberate when making production changes.

  • Builds systems engineers can trust in high-pressure environments.

This Role Is Not For

  • Engineers who only react to incidents instead of preventing them.

  • People who are careless with production systems or access control.

  • Individuals who ignore monitoring, alerting or operational discipline.

  • Engineers who make risky changes without proper analysis or safeguards.

  • Candidates who cannot stay calm during incidents or outages.

Success in This Role

You'll be successful if you can:

  • Improve platform uptime, reliability and operational stability.

  • Reduce production incidents and recurring system failures.

  • Strengthen observability, monitoring and incident response maturity.

  • Enable engineers to deploy safely with minimal operational risk.

  • Improve overall resilience of BJAK’s insurance automation platform.

Why Join BJAK

  • Build Reliable Insurance Systems – Support mission-critical automation at scale.

  • High-Impact Engineering – Solve real-world reliability and distributed systems challenges.

  • Global Engineering Team – Work with experienced engineers across multiple countries.

  • Fully Remote – Work remotely from China while collaborating with our Malaysia-based teams.

  • International Exposure – Build systems used across Southeast Asia markets.

  • Learning & Development Budget – Support continuous technical growth and certifications.

  • High Ownership Environment – Strong autonomy over reliability and operational design.

  • Modern Engineering Culture – Focus on stability, observability and engineering excellence.

  • Competitive Compensation – Attractive salary package based on experience and impact.

Interview Process

We assess reliability engineering depth, incident handling capability and production systems thinking. The process usually includes application review, two interviews and a technical scenario or systems discussion.

DevOps pay context

Based on 1,254 disclosed DevOps salaries on RoleSuite, the role pays a median of $141K/year, with most offers between $115K and $173K (10th–90th percentile: $100K–$210K).

See the full DevOps salary breakdown →
Apply →

Other roles at Bjakcareer

  • Test Automation Engineer - API & Web (Remote, China)China
  • Software QA Engineer - Policy & Claims Automation (Remote, China)China
  • Test Automation Engineer - Customer Process Automation (Remote, China)China
  • QA Automation Engineer - Product Reliability (Remote, China)China
  • QA Automation Engineer - Release Quality (Remote, China)China
  • QA Automation Engineer - AI Workflow Systems (Remote, China)China
  • DevOps Engineer - CI/CD & Monitoring (Remote, China)China
  • Platform Engineer - AI Workflow Systems (Remote, China)China
  • Cloud Engineer - Automation Infrastructure (Remote, China)China
  • DevOps Engineer - Platform Reliability (Remote, China)China

More DevOps roles

  • Cloud Platform EngineerAccenture · Bengaluru
  • Senior Lead Site Reliability EngineerJPMorgan Chase · Palo Alto, CA, United States
  • Senior Director of Site Reliability EngineeringJPMorgan Chase · Palo Alto, CA, United States
  • Senior Lead Site Reliability Engineer - Manager-AI/ML and Data PlatformsJPMorgan Chase · Jersey City, NJ, United States
  • Site Reliability Engineer IIIJPMorgan Chase · Jersey City, NJ, United States
  • Network Dev Engineer II, Corporate Network EngineeringAmazon · Bengaluru, Karnataka, IND
  • ASE Senior Site Reliability EngineerApple · Cupertino
  • Sr. Site Reliability Engineer (Application Software)SpaceX · Hawthorne, CA
  • Staff Forward Deployed Platform EngineerCharliehealth · New York, NY
  • Senior Data Platform EngineerMedium · Remote - US