Senior Software Engineer - Reliability
About Us
Nu is one of the largest digital financial platforms in the world, with more than 127 million customers across Brazil, Mexico, and Colombia. Guided by our mission to fight complexity and empower people, we are redefining financial services in Latin America and this is still just the beginning of the purple future we're building.
Listed on the New York Stock Exchange (NYSE: NU), we combine proprietary technology, data intelligence, and an efficient operating model to deliver financial products that are simple, accessible, and human.
Our impact has been recognized by global rankings such as Time 100 Companies, Fast Company’s Most Innovative Companies, and Forbes World’s Best Bank. Visit our institutional page https://international.nubank.com.br/careers/
About the role
The U.S. Market team is launching a differentiated financial product in the largest and most demanding financial market in the world. We’re iterating quickly on real customer signals while building systems that will eventually serve customers at Nubank scale. That combination — early-stage velocity, regulatory weight, and high reliability expectations, requires an engineer whose primary mandate is reliability, scale, and operational excellence.
This role exists to make sure the systems we’re building today can be trusted in production tomorrow, and to set the bar for what “production-ready” means on this team. The engineer in this role delivers their mandate by writing production code, shaping architecture, and engineering the systems themselves — not by absorbing operational load.
You'll be responsible for
Define and operate against SLOs. Establish meaningful SLIs and SLOs with product and engineering partners, manage error budgets, and use them as real inputs to prioritization rather than dashboards no one reads.
- Build the observability layer. Improve metrics, logs, traces, and alerting so issues are detected early, attributed precisely, and debugged with code-level confidence. Push instrumentation upstream into the services we own.
- Lead incident response. Act as incident commander when needed, drive blameless postmortems, and turn findings into concrete engineering work that lands. Build the muscle in the team so this isn’t centralized in any one person.
- Reduce toil through engineering. Identify repetitive operational work and eliminate it with software — automation, self-healing behavior, better defaults, better tooling — rather than absorbing it as ongoing overhead.
- Production Hardening. Stress-test designs for partial failure, dependency degradation, traffic spikes, and adversarial inputs. Run capacity and performance work before incidents arise. Ensure resiliency primitives are tuned and working correctly.
- Make change safe and fast. Improve release safety through progressive delivery, feature flags, canaries, rollbacks, and tested migrations. Help the squad ship faster and with lower blast radius.
- Improve developer experience especially where it removes operational friction or improves change safety. Where internal tooling or platform gaps slow the team down, build or contribute the fix. Prefer leverage over heroics.
- Partner across disciplines. Work closely with product, platform, security, compliance, and other engineering teams. Translate reliability and risk tradeoffs into language each audience can act on.
- Raise the engineering bar. Mentor engineers, review hard designs and PRs, and shape technical standards across the squad. Lead through clarity and judgment, not authority.
We are looking for a person who has
Track record of owning services in production — not just shipping them, but being the engineer responsible for how they behave under real load and real failure.
- Experience defining and operating against SLOs/SLIs, and using error budgets to influence engineering and product decisions.
- Experience leading incident response and writing postmortems that produced durable improvements.
- Hands-on experience with observability tooling (metrics, structured logging, distributed tracing) and using it to diagnose nontrivial production issues.
- Deep system design experience: distributed services, asynchronous messaging, storage tradeoffs, API design, idempotency, consistency, backpressure, and graceful degradation.
- Significant industry experience building and operating production software systems in a high-ownership engineering environment.
- Comfort operating in modern cloud environments (e.g., AWS/GCP), containerized workloads, and CI/CD pipelines, and reasoning about their failure modes.
- Demonstrated technical leadership: influencing architecture across teams, mentoring strong engineers, and making the people around you better.
- Pragmatism. You can hold a high reliability bar while still helping a fast-moving squad ship.
Location for this opportunity (City, Country)
- Miami, United States
Our Benefits
- Opportunity of earning equity at Nu
- Medical Insurance
- Dental and Vision Insurance
- Life Insurance and AD&D
- Extended maternity and paternity leaves
- Nucleo - Our learning platform of courses
- NuLanguage - Our language learning program
- NuCare - Our mental health and wellness assistance program
- 401K
- Saving Plans - Health Saving Account and Flexible Spending Account
- Work-from-home Allowance
- Relocation Assistance Package, if applicable.
Work Model for this Role
Hybrid 2-3 times/week: Our hybrid work model brings us to the office at least twice a week, on strategic days designed to maximize team connection and collaboration. For more details, visit https://building.nubank.com/nu-hybrid-work-model/
Explore how we build technology at Nubank:
🎥 youtube.com/@building.nubank ↗
🎧 Listen to our stories on Spotify ↗