IN_Senior Associate_Data Engineer_Emerging Businesses_Advisory_Bangalore

PwC · Bengaluru Millenia

Line of Service

Advisory

Industry/Sector

Not Applicable

Specialism

Operations

Management Level

Senior Associate

Job Description & Summary

At PwC, our people in software and product innovation focus on developing cutting-edge software solutions and driving product innovation to meet the evolving needs of clients. These individuals combine technical experience with creative thinking to deliver innovative software products and solutions.

In emerging technology at PwC, you will focus on exploring and implementing cutting-edge technologies to drive innovation and transformation for clients. You will work in areas such as artificial intelligence, blockchain, and the internet of things (IoT).

*Why PWC

At PwC, you will be part of a vibrant community of solvers that leads with trust and creates distinctive outcomes for our clients and communities. This purpose-led and values-driven work, powered by technology in an environment that drives innovation, will enable you to make a tangible impact in the real world. We reward your contributions, support your wellbeing, and offer inclusive benefits, flexibility programmes and mentorship that will help you thrive in work and life. Together, we grow, learn, care, collaborate, and create a future of infinite experiences for each other. Learn more

about us

At PwC, we believe in providing equal employment opportunities, without any discrimination on the grounds of gender, ethnic background, age, disability, marital status, sexual orientation, pregnancy, gender identity or expression, religion or other beliefs, perceived differences and status protected by law. We strive to create an environment where each one of our people can bring their true selves and contribute to their personal growth and the firm’s growth. To enable this, we have zero tolerance for any discrimination and harassment based on the above considerations. "

Job Description & Summary: A career within ...............................

Responsibilities: Design, build, and maintain robust ETL/ELT pipelines on Azure using Apache Spark (PySpark/Scala) on Databricks/ HDInsight • Orchestrate complex data workflows with Azure Data Factory (pipelines, triggers, integration runtimes) and Databricks Jobs/Workflows • Develop and manage data lakes on Azure Blob Storage/ADLS Gen2, including naming, partitioning, lifecycle policies, and schema evolution • Implement curated layers (raw, staged, curated), leveraging columnar formats (Parquet/ORC/Avro) and table formats (Delta Lake); manage metastore/Unity Catalog • Optimize Spark jobs and cluster configurations for performance and cost (autoscaling, spot VMs, Photon, adaptive query execution, caching, partition tuning) • Operationalize jobs with monitoring, logging, and alerting via Azure Monitor, Log Analytics, and Databricks metrics; build runbooks and dashboards • Implement data quality, testing, and observability for pipelines (unit/integration tests, Great Expectations, SLAs, lineage) • Collaborate with Analytics, Data Science, and Product to deliver modeled, trustworthy datasets for BI, ML, and applications • Enforce security and governance best practices (AAD RBAC, Managed Identities, ACLs, Key Vault, Private Endpoints, VNet integration, encryption with CMK/SSE) • Contribute to infrastructure as code and CI/CD (Azure DevOps or GitHub Actions) including Databricks objects deployment • Participate in on-call rotations, incident response, and postmortems; drive continuous improvement and documentation Man

Mandatory skill sets:

6+ years as a Data Engineer (or similar) with a strong focus on Azure data services • Expert-level experience with Apache Spark (PySpark and/or Scala) and distributed data processing • Hands-on experience with Databricks and Azure HDInsight for large-scale batch processing • Proficient in Azure Data Factory for orchestration (pipelines, data flows, triggers) • Strong Python skills and solid SQL (window functions, performance tuning, optimization) • Practical experience with Azure Blob Storage/ADLS Gen2, Azure Key Vault, Azure Monitor/Log Analytics • Understanding of Hadoop ecosystem fundamentals (HDFS, YARN, Hive/Metastore) • Strong grasp of data modeling, file formats (Parquet/ORC/Avro), partitioning, and performance best practices • Experience building production-grade pipelines with testing, monitoring, and alerting • Version control with Git and collaborative development practices • Excellent communication and cross-functional collaboration skills • Understanding of Hadoop ecosystem fundamentals (HDFS, YARN, Hive/Metastore) • Strong grasp of data modeling, file formats (Parquet/ORC/Avro), partitioning, and performance best practices • Experience building production-grade pipelines with testing, monitoring, and alerting • Version control with Git and collaborative development practices • Excellent communication and ability to work cross-functionally

Preferred skill sets:

Stream processing and event-driven architectures (Kafka, Azure Event Hubs, Azure Functions) • Lakehouse technologies (Delta Lake, Unity Catalog) and query engines (Synapse Serverless, Databricks SQL) • Governance and lineage tools (Microsoft Purview, OpenLineage) • Cost optimization on Azure (cluster policies, Photon, spot VMs, storage tiers hot/cool/archive) • Infrastructure as code (Terraform/Bicep) and CI/CD for data workflows (Azure DevOps/GitHub Actions) • Containerization and orchestration (Docker, AKS) and packaging for Databricks (dbx, DABs) • Experience integrating with warehouses (Synapse Dedicated SQL Pools, Snowflake) and BI tools (Power BI) • Security/compliance exposure (PII handling, least-privilege, network isolation)

Years of experience required

5-7 years

Education qualification:

BE/B.Tech/MBA/MCA

Education (if blank, degree and/or field of study not specified)

Degrees/Field of Study required: Bachelor of Engineering, MBA (Master of Business Administration)

Degrees/Field of Study preferred:

Certifications (if blank, certifications not specified)

Required Skills

Data Engineering

Optional Skills

Accepting Feedback, Accepting Feedback, Active Listening, Analytical Thinking, Artificial Intelligence, Business Planning and Simulation (BW-BPS), Communication, Competitive Advantage, Conducting Research, Creativity, Digital Transformation, Embracing Change, Emotional Regulation, Empathy, Implementing Technology, Inclusion, Innovation Processes, Intellectual Curiosity, Internet of Things (IoT), Learning Agility, Optimism, Product Development, Product Testing, Prototyping, Quality Assurance Process Management {+ 10 more}

Desired Languages (If blank, desired languages not specified)

Travel Requirements

Available for Work Visa Sponsorship?

Government Clearance Required?

Job Posting End Date

July 6, 2026

Data & ML pay context

Based on 1,425 disclosed Data & ML salaries on RoleSuite, the role pays a median of $162K/year, with most offers between $127K and $202K (10th–90th percentile: $105K–$245K).

See the full Data & ML salary breakdown →

Apply →

IN_Senior Associate_Data Engineer_Emerging Businesses_Advisory_Bangalore

Data & ML pay context

Other roles at PwC

More Data & ML roles