Data Engineer II

Amazon · Bengaluru, Karnataka, IND

Fulfillment by Amazon (FBA) enables sellers to scale their businesses globally by leveraging Amazon's world-class fulfillment network. The WW FBA Central Analytics team builds and operates scalable, enterprise-grade data infrastructure, tools, and analytics solutions that power the WW FBA business. We partner across global product, program, and operations teams to unify diverse datasets, deliver self-service analytics, and develop next-generation capabilities using LLMs to unlock insights.

We are building a GenAI-powered insights assistant and data infrastructure that enables leaders to query complex FBA data using natural language and receive accurate, contextual answers in seconds. This initiative spans multiple business domains and requires a robust, scalable data platform that delivers fresh, validated, and well-documented data at enterprise scale.

We are seeking a Data Engineer II to own and scale the data platform powering this project. You will design, build, and operate high-reliability ETL pipelines across multiple FBA business domains, drive the DBT migration strategy, and establish monitoring and data quality frameworks. You will partner with Data Engineers, Business Analysts, and SMEs to ensure the data foundation meets strict accuracy, freshness, and documentation standards required for AI-driven insights.

Key job responsibilities
- Design and build scalable ETL pipelines in Spark/PySpark to ingest, transform, and load FBA metrics across multiple business domains into the Data Lakehouse.
- Own the DBT migration strategy. Architect the dbt project structure, define semantic models, and migrate existing pipelines from legacy orchestration to dbt + MWAA/Airflow.
- Build aggregate tables at daily, weekly, monthly, quarterly, and yearly grains from source tables using Maestro and dbt. Ensure correct business logic alignment with WBR/MBR/QBR metrics.
- Implement data validation frameworks including automated pre-built queries to cross-validate data across multiple source systems (US 3P, EU, CNGS).
- Design and deploy monitoring and alerting systems for all data pipelines. Automate ticketing on job failures, SLA breach notifications, and data freshness checks.
- Define and enforce data quality contracts: schema evolution policies, null-rate thresholds, row-count variance alerts, and backfill integrity checks.
- Develop and maintain documentation for all table schemas, column descriptions, business definitions, and data lineage.
- Optimize table structures and query patterns for fast, cost-efficient access by AI systems generating SQL from natural language.
- Orchestrate pipeline dependencies across domains and support regional expansion (EU, IN, JP) with minimal code duplication.
- Mentor junior Data Engineers on pipeline design patterns, code review standards, and operational best practices.- Bachelor's degree
- 3+ years of data engineering experience.
- Experience architecting and operationalizing ETL/ELT pipelines with Spark, PySpark, or SparkSQL at scale.
- Proficiency in SQL and at least one scripting language (Python preferred).
- Experience with data modeling, warehousing, and building batch and near-real-time pipelines.
- Experience with workflow orchestration tools (Airflow, MWAA, Step Functions, or equivalent).
- Experience building data quality frameworks and monitoring/alerting for production data systems.- Experience with AWS technologies like Redshift, S3, AWS Glue, EMR, Kinesis, FireHose, Lambda, and IAM roles and permissions
- Experience with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)
- Experience with dbt (data build tool) for transformation layer management and semantic modeling.
- Familiarity with data lakehouse architectures (Iceberg, Delta Lake, Hudi).
- Experience operating data platforms serving AI/ML downstream consumers.
- Track record of cross-functional collaboration with business analysts and product teams to define metric schemas.
- Experience with CI/CD for data pipelines (CodePipeline, GitHub Actions, or equivalent).

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

Data & ML pay context

Based on 1,359 disclosed Data & ML salaries on RoleSuite, the role pays a median of $165K/year, with most offers between $128K and $209K (10th–90th percentile: $106K–$246K).

See the full Data & ML salary breakdown →

Apply →

Data Engineer II

Data & ML pay context

Other roles at Amazon

More Data & ML roles