This position is listed on behalf of a partner company, who manages all applications and next steps. Our partner is looking for a Data Engineer based in Ireland.
This role focuses on rebuilding trust in a complex, regulated data environment where existing pipelines are not yet reliable, reproducible, or fully validated. You will be responsible for transforming a newly centralized data lake into a robust, analytics-ready foundation that supports downstream data science and risk modelling use cases. Working within a regulated credit and lending context, you will design and enforce strong data quality, lineage, and governance standards across multiple source systems. The role requires deep hands-on engineering across AWS, Spark, and modern data tooling, with a strong emphasis on correctness, auditability, and reproducibility. You will collaborate closely with data science and engineering stakeholders to define harmonized data models and prepare feature-ready datasets. This is a high-impact foundational role where your work directly enables reliable decision-making in a financial risk environment.
Accountabilities:
- Rebuild and validate data pipelines to ensure full reproducibility of reporting and descriptive statistics across all datasets
- Profile, reconcile, and harmonize heterogeneous source schemas across multiple business entities into a unified data model
- Design and implement dbt-based data models (staging, intermediate, and marts) with strong testing and validation layers
- Develop and maintain data quality frameworks using tools such as Great Expectations and dbt tests to enforce reliability
- Build and implement entity resolution and record linkage logic across fragmented customer and account datasets
- Ensure robust anonymization and pseudonymization processes that meet regulatory and compliance requirements
- Optimize large-scale Spark-based processing jobs, including partitioning strategies, file formats, and cost-efficient compute usage
- Orchestrate production-grade pipelines using tools such as Airflow or AWS Step Functions
- Deliver clean, documented, and feature-ready datasets for downstream data science and risk modelling teams
- Create clear technical documentation and runbooks to support operational handover and long-term maintainability
Requirements:
- 4+ years of professional experience in data engineering with strong exposure to large-scale AWS and Spark environments
- Advanced proficiency in SQL and Python for data processing and transformation at scale
- Strong experience with AWS data services including S3, Glue, Athena, Redshift, EMR, and orchestration tools
- Proven experience building and maintaining data models using dbt or similar frameworks
- Hands-on experience with data quality, validation, and testing frameworks such as Great Expectations
- Strong understanding of data governance, lineage, and reproducibility in production environments
- Experience with entity resolution, deduplication, or record linkage across multiple data sources
- Familiarity with anonymization and pseudonymization techniques in regulated environments
- Experience working in regulated industries such as BFSI, healthcare, or government is highly valued
- Ability to work independently or as a lead engineer within a small, fast-moving delivery team
- Strong written and verbal communication skills in English, with the ability to document and explain complex systems clearly
Benefits:
- Competitive compensation package aligned with experience and impact
- Remote-friendly working arrangements within Europe
- Opportunity to work on a high-impact, regulated data transformation project
- Exposure to modern AWS data architecture and large-scale Spark processing environments
- Direct collaboration with data science and engineering leadership on meaningful analytics use cases
- Strong autonomy in shaping data foundations and engineering standards
- Opportunity to build robust, production-grade systems from an early-stage data estate
- International, collaborative environment with distributed teams