Line of Service
AdvisoryIndustry/Sector
Not ApplicableSpecialism
OperationsManagement Level
Senior AssociateJob Description & Summary
At PwC, our people in software and product innovation focus on developing cutting-edge software solutions and driving product innovation to meet the evolving needs of clients. These individuals combine technical experience with creative thinking to deliver innovative software products and solutions.Job Description & Summary: A career within ...............................
Responsibilities: Design, build, and maintain robust ETL/ELT pipelines on Azure using Apache Spark (PySpark/Scala) on Databricks/ HDInsight • Orchestrate complex data workflows with Azure Data Factory (pipelines, triggers, integration runtimes) and Databricks Jobs/Workflows • Develop and manage data lakes on Azure Blob Storage/ADLS Gen2, including naming, partitioning, lifecycle policies, and schema evolution • Implement curated layers (raw, staged, curated), leveraging columnar formats (Parquet/ORC/Avro) and table formats (Delta Lake); manage metastore/Unity Catalog • Optimize Spark jobs and cluster configurations for performance and cost (autoscaling, spot VMs, Photon, adaptive query execution, caching, partition tuning) • Operationalize jobs with monitoring, logging, and alerting via Azure Monitor, Log Analytics, and Databricks metrics; build runbooks and dashboards • Implement data quality, testing, and observability for pipelines (unit/integration tests, Great Expectations, SLAs, lineage) • Collaborate with Analytics, Data Science, and Product to deliver modeled, trustworthy datasets for BI, ML, and applications • Enforce security and governance best practices (AAD RBAC, Managed Identities, ACLs, Key Vault, Private Endpoints, VNet integration, encryption with CMK/SSE) • Contribute to infrastructure as code and CI/CD (Azure DevOps or GitHub Actions) including Databricks objects deployment • Participate in on-call rotations, incident response, and postmortems; drive continuous improvement and documentation Man
Mandatory skill sets:
6+ years as a Data Engineer (or similar) with a strong focus on Azure data services • Expert-level experience with Apache Spark (PySpark and/or Scala) and distributed data processing • Hands-on experience with Databricks and Azure HDInsight for large-scale batch processing • Proficient in Azure Data Factory for orchestration (pipelines, data flows, triggers) • Strong Python skills and solid SQL (window functions, performance tuning, optimization) • Practical experience with Azure Blob Storage/ADLS Gen2, Azure Key Vault, Azure Monitor/Log Analytics • Understanding of Hadoop ecosystem fundamentals (HDFS, YARN, Hive/Metastore) • Strong grasp of data modeling, file formats (Parquet/ORC/Avro), partitioning, and performance best practices • Experience building production-grade pipelines with testing, monitoring, and alerting • Version control with Git and collaborative development practices • Excellent communication and cross-functional collaboration skills • Understanding of Hadoop ecosystem fundamentals (HDFS, YARN, Hive/Metastore) • Strong grasp of data modeling, file formats (Parquet/ORC/Avro), partitioning, and performance best practices • Experience building production-grade pipelines with testing, monitoring, and alerting • Version control with Git and collaborative development practices • Excellent communication and ability to work cross-functionally
Preferred skill sets:
Stream processing and event-driven architectures (Kafka, Azure Event Hubs, Azure Functions) • Lakehouse technologies (Delta Lake, Unity Catalog) and query engines (Synapse Serverless, Databricks SQL) • Governance and lineage tools (Microsoft Purview, OpenLineage) • Cost optimization on Azure (cluster policies, Photon, spot VMs, storage tiers hot/cool/archive) • Infrastructure as code (Terraform/Bicep) and CI/CD for data workflows (Azure DevOps/GitHub Actions) • Containerization and orchestration (Docker, AKS) and packaging for Databricks (dbx, DABs) • Experience integrating with warehouses (Synapse Dedicated SQL Pools, Snowflake) and BI tools (Power BI) • Security/compliance exposure (PII handling, least-privilege, network isolation)
Years of experience required
5-7 years
Education qualification:
BE/B.Tech/MBA/MCA
Education (if blank, degree and/or field of study not specified)
Degrees/Field of Study required: Bachelor of Engineering, MBA (Master of Business Administration)Degrees/Field of Study preferred:Certifications (if blank, certifications not specified)
Required Skills
Data EngineeringOptional Skills
Accepting Feedback, Accepting Feedback, Active Listening, Analytical Thinking, Artificial Intelligence, Business Planning and Simulation (BW-BPS), Communication, Competitive Advantage, Conducting Research, Creativity, Digital Transformation, Embracing Change, Emotional Regulation, Empathy, Implementing Technology, Inclusion, Innovation Processes, Intellectual Curiosity, Internet of Things (IoT), Learning Agility, Optimism, Product Development, Product Testing, Prototyping, Quality Assurance Process Management {+ 10 more}Desired Languages (If blank, desired languages not specified)
Travel Requirements
Available for Work Visa Sponsorship?
Government Clearance Required?
Job Posting End Date
July 6, 2026Based on 1,425 disclosed Data & ML salaries on RoleSuite, the role pays a median of $162K/year, with most offers between $127K and $202K (10th–90th percentile: $105K–$245K).
See the full Data & ML salary breakdown →