Axle is a bioscience and information technology company that offers advancements in translational research, biomedical informatics, and data science applications to research centers and healthcare organizations nationally and abroad. With experts in biomedical science, software engineering, and program management, we focus on developing and applying research tools and techniques to empower decision-making and accelerate research discoveries. We work with some of the top research organizations and facilities in the country including multiple institutes at the National Institutes of Health (NIH).
Benefits We Offer:
We are seeking a Data Engineer to support biomedical science, clinical research data integration, and advanced data analysis initiatives. In this role, you will design, build, optimize, and maintain data pipelines and data workflows that support the ingestion, transformation, harmonization, validation, and delivery of complex biomedical datasets. You will collaborate closely with scientists, researchers, data scientists, bioinformaticians, application developers, and technical stakeholders to ensure data is accessible, well-structured, secure, documented, and reusable for biomedical research, analytics, reporting, and discovery.
The ideal candidate will have strong experience with Python, SQL, ETL/ELT development, data modeling, data quality practices, and research data lifecycle support. This role requires the ability to work with complex multi-source datasets, support analytics and application-facing data products, and contribute to scalable, well-governed data solutions that align with the Data Science Client Services branch priorities for data accessibility, interoperability, reproducibility, modernization, and secure research enablement.
Key Responsibilities
Data Pipeline Development: Design, build, test, and maintain data pipelines to ingest, transform, harmonize, and integrate diverse biomedical and research data sources, including clinical, genomic, experimental, imaging, biospecimen, operational, and other scientific datasets. Develop reusable transformation logic and curated datasets that support analytics, reporting, dashboards, applications, APIs, and downstream research workflows.
Data Integration and Lifecycle Support: Support the full research data lifecycle by enabling reliable data movement from source systems and storage environments into structured, analysis-ready formats. Assist with data ingestion, curation, metadata capture, data refreshes, source-to-target mapping, schema management, and long-term maintainability of data products and workflows.
Collaboration: Work closely with data scientists, bioinformaticians, researchers, application developers, project managers, and government stakeholders to gather requirements and deliver practical data solutions. Translate scientific and operational data needs into technical specifications, data models, transformation logic, and reusable datasets that accelerate biomedical research workflows and support informed decision-making.
Quality & Governance: Implement data validation checks, reconciliation routines, testing practices, and monitoring processes to ensure data accuracy, completeness, consistency, and integrity. Follow data governance and security best practices, including documentation of transformations, lineage, assumptions, access requirements, and compliance considerations related to sensitive, regulated, de-identified, or access-controlled research data.
Dashboarding & Integration: Create or support interactive dashboards, reporting layers, APIs, and application-ready datasets that allow researchers and stakeholders to visualize, explore, and analyze data. Support integration between data pipelines, databases, cloud platforms, analytics environments, and approved application platforms to enable scalable and secure data access.
Operational Support and Modernization: Troubleshoot data pipeline failures, source system inconsistencies, data quality issues, schema changes, access issues, and performance bottlenecks. Contribute to modernization efforts by improving automation, documentation, scalability, reproducibility, and platform readiness across environments.
Required Qualifications
Education & Background: Bachelor’s degree in Computer Science, Data Science, Bioinformatics, Biomedical Informatics, Information Systems, Engineering, or a related field, or equivalent practical experience. Proven experience as a Data Engineer, Analytics Engineer, Data Integration Developer, Bioinformatics Engineer, or similar data-intensive role, preferably supporting analytics, biomedical research, healthcare, scientific computing, or research data teams.
Data Engineering Expertise: Strong proficiency in Python and SQL for data manipulation, transformation, scripting, automation, and analysis. Hands-on experience building ETL/ELT processes and data pipelines to support large, complex, multi-source datasets. Familiarity with scalable data processing approaches, including Spark/PySpark or similar frameworks, for high-volume or complex transformations is required.
Analytical Skills: Solid understanding of data modeling, relational databases, data warehouses, data lakes, metadata, and database concepts. Ability to work with complex, multi-modal datasets, including structured, semi-structured, and unstructured data, and optimize data workflows for reliability, performance, usability, and long-term maintainability.
Best Practices: Knowledge of software engineering and data engineering best practices, including version control using Git, code review, automated testing, documentation, peer review, and change management. Experience ensuring data quality and using lineage, provenance tracking, audit trails, or documentation practices to support transparency, reproducibility, and data flow traceability.
Collaboration & Communication: Excellent problem-solving skills and the ability to communicate effectively with both technical and non-technical stakeholders. Comfortable working in an interdisciplinary environment with biomedical researchers, analysts, developers, and project teams. Capable of translating domain-specific needs into technical solutions and explaining technical risks, limitations, and dependencies in clear stakeholder-focused language.
Domain Alignment: Strong interest in biomedical science, clinical research, healthcare data, and scientific discovery. Ability to quickly learn domain-specific concepts, data structures, terminology, and research workflows. Demonstrated awareness of sensitive data handling, privacy, access control, data governance, and regulatory or compliance expectations associated with biomedical and clinical research data.
Preferred Qualifications (Plus Skills)
Platform-as-a-Service and Data Platform Experience: Hands-on experience building data solutions in modern data platforms or platform-as-a-service environments such as Snowflake, Databricks, Palantir, cloud data warehouses, data lakes, or similar platforms. Experience supporting integrations across databases, cloud storage, APIs, analytics platforms, dashboards, and application environments is preferred.
Research and Application Enablement: Experience preparing curated datasets for dashboards, APIs, web applications, reporting tools, notebooks, or scientific computing environments. Familiarity with research-facing tools and platforms such as Posit Connect, R/Shiny, Streamlit, Jupyter, Galaxy, Code Ocean, or similar analytics and application delivery environments is a plus.
Cloud, Storage, and Automation Experience: Experience working with cloud or hybrid data environments, object storage such as S3, relational databases such as Postgres, automated data refreshes, scheduled jobs, API-based integrations, and secure data movement across controlled environments.
Biomedical Domain Knowledge: Previous experience in biomedical research, healthcare analytics, clinical research, public health, pharmaceutical research and development, or scientific data management. Familiarity with biomedical data standards or datasets, such as clinical trial data, clinical imaging, laboratory data, biospecimen data, transcriptomics/genomic data, HL7/FHIR, CDISC, OMOP, or related standards, and an understanding of the scientific research process will help you excel in this role.
Governance and Reproducibility: Experience supporting data governance, metadata management, data lineage, reproducible workflows, documentation standards, and secure handling of de-identified, sensitive, or access-controlled research datasets.
Disclaimer: The above description is meant to illustrate the general nature of work and level of effort being performed by individuals assigned to this position or job description. This is not restricted as a complete list of all skills, responsibilities, duties, and/or assignments required. Individuals may be required to perform duties outside of their position, job description or responsibilities as needed.
The diversity of Axle’s employees is a tremendous asset. We are firmly committed to providing equal opportunity in all aspects of employment and will not tolerate any illegal discrimination or harassment based on age, race, gender, religion, national origin, disability, marital status, covered veteran status, sexual orientation, status with respect to public assistance, and other characteristics protected under state, federal, or local law and to deter those who aid, abet, or induce discrimination or coerce others to discriminate.
Accessibility: If you need an accommodation as part of the employment process please contact: [email protected]
This role has a market-competitive salary with an anticipated base compensation range listed below. Actual salaries will vary depending on a candidate’s experience, qualifications, skills, and location.
Based on 1,570 disclosed Data & ML salaries on RoleSuite, the role pays a median of $162K/year, with most offers between $127K and $203K (10th–90th percentile: $106K–$246K).
See the full Data & ML salary breakdown →