Product Data Scientist — AI Evaluation & Quality

PNL · Berlin

About Finom

Finom is a European tech startup headquartered in Amsterdam, and we’re on a journey towards revolutionizing the financial landscape for entrepreneurs worldwide. Our mission is to develop an all-in-one financial B2B solution that integrates banking functions, accounting, financial management, and invoicing into a seamless, mobile-first platform.

We recently closed a €115 million Series C equity round (around $133 million), bringing our total funding to approximately $346 million. This significant investment follows a $105 million growth funding round from General Catalyst, a long-term backer since 2021 known for supporting companies like Airbnb, HubSpot, KAYAK, and Stripe.

Finom's platform goes beyond traditional banking, offering invoicing and a growing suite of features, including AI-enabled accounting, aiming to simplify financial management for entrepreneurs. We're actively expanding our reach across key EU markets like Germany, France, the Netherlands, Italy, and Spain.

At Finom, we’re not just redefining the entrepreneurial experience — we’re empowering our employees to make a real difference. Your work matters, and your impact extends far beyond product metrics. We nurture innovation and an inspiring work environment where bold ideas thrive, prioritizing thorough research, swift implementation of solutions, and ensuring that every effort we make benefits our users, employees, partners, and our business as a whole.

Maintaining our start-up spirit, we prioritize thorough research, swift implementation of solutions, and ensuring that every effort we make benefits our users, employees, partners, and, of course, our business.

You'll join the AI Team — the group driving all AI products and technology at Finom

We build and ship AI across the company: AI financial co-pilot, voice agent, and internal AI-powered processes

Our belief: your AI agent is only as good as your eval loop — we can build AI as good as the evals we run on it

Your mission: own that eval loop across every AI product we ship — pre-launch quality gates, post-launch monitoring, continuous improvement

You'll work directly with our AI Quality lead, Igor Kolodkin

Close collaboration with AI engineers, Product, and domain experts across the company

Core stack: Databricks, DeepEval, Claude Code

What You Will Be Doing

Own and extend our offline eval suite across products — datasets (capability + regression), judges, metrics

Build and maintain online quality dashboards: resolution rate, CSAT, thumbs up/down, LLM-as-judge signals, error rate, latency

Close the production feedback loop: mine failure patterns from real traffic → turn them into regression cases → propose fixes to Product and domain experts

Harden methodology: judge stability, non-determinism handling

Translate numbers into decisions – weekly syncs, clear trade-offs, no dashboards for their own sake

Must-Haves

Python and SQL — you can build an analysis end-to-end

Solid foundation in statistics — sampling, hypothesis testing, variance, understanding what a noisy metric is

Analytical mindset — you start from the business question, not from the tool

3+ years in analyst / data scientist roles, at least one in a product context

Nice-to-Haves

Experience in quality analytics for ML systems — ranking, recommendations, classification, etc.

Hands-on experience evaluating LLM applications (RAG, agents, tool use, judges)

Experience building LLM agents — side projects, toy builds, personal experiments all count

How we work — one thing we mean seriously

AI-assisted coding is our default authoring environment, not a bonus

Claude Code is our main tool — you'll reach for it for SQL, Python, analyses, dashboards, and internal scripts

We're looking for analysts who are already curious and fluent with AI coding — or genuinely excited to become fluent fast

We care about what you ship and how clearly you think

If this idea excites you rather than worries you, you'll feel at home here

Data & ML pay context

Based on 1,462 disclosed Data & ML salaries on RoleSuite, the role pays a median of $162K/year, with most offers between $127K and $204K (10th–90th percentile: $102K–$245K).

See the full Data & ML salary breakdown →

Apply →