Data Integrity Validation Methodologies for High-Volume Healthcare ETL Pipelines: Automated Testing Strategies and Quality Assurance Frameworks
Keywords:
data integrity, healthcare ETL, ELT, data validation, automated testing, data quality, interoperability, dbt testing, auditability, quality assuranceAbstract
The article examines methodologies for validating data integrity in high-volume healthcare ETL/ELT pipelines where heterogeneous source systems, evolving interoperability standards, and strict compliance constraints amplify the consequences of defects. Relevance stems from the operational dependence of clinical analytics, revenue-cycle reporting, and population health workflows on accurate, traceable, and scalable transformed data. Novelty is provided by an integrated validation model that connects data-quality theory, secure-processing guidance, and modern transformation testing practices into a single quality-assurance workflow tailored to healthcare semantics. The work aims to synthesize automated testing strategies that reduce undetected schema drift, mapping errors, and business-rule violations across batch and near-real-time processing. For this purpose, the article applies analytical review, comparative synthesis, and structured mapping of controls to pipeline stages, drawing on recent peer-reviewed research and authoritative standards. The concluding section formulates implementation-ready recommendations for layered checks, evidence logging, and governance linkages. The article will benefit healthcare data engineers, analytics leaders, and compliance stakeholders responsible for reliable data delivery.
References
[1]. Abughazala, M., Ibiyo, M., Muccini, H., & Sharaf, M. (2025). Quality by prompt: LLM-powered transformation of data quality requirements into Great Expectations. In Software engineering and advanced applications: 51st Euromicro Conference, SEAA 2025, Salerno, Italy, September 10–12, 2025, proceedings, part I (pp. 130–147). Springer-Verlag. https://doi.org/10.1007/978-3-032-04190-6_9
[2]. Assistant Secretary for Technology Policy/Office of the National Coordinator for Health Information Technology. (2025, December 18). Interoperability. https://healthit.gov/interoperability/
[3]. HL7 International. (2023). FHIR release 5 (v5.0.0): R5—STU. https://hl7.org/fhir/R5/
[4]. Foidl, H., Golendukhina, V., Ramler, R., & Felderer, M. (2024). Data pipeline quality: Influencing factors, root causes of data-related issues, and processing problem areas for developers. Journal of Systems and Software, 207, 111855. https://doi.org/10.1016/j.jss.2023.111855
[5]. Fu, Q., Nicholson, G. L., & Easton, J. M. (2024). Understanding data quality in a data-driven industry context: Insights from the fundamentals. Journal of Industrial Information Integration, 42, 100729. https://doi.org/10.1016/j.jii.2024.100729
[6]. dbt Labs. (2026, January 30). Model contracts (dbt Developer Hub). https://docs.getdbt.com/docs/mesh/govern/model-contracts
[7]. dbt Labs. (2026, February). Unit tests (dbt Developer Hub). https://docs.getdbt.com/docs/build/unit-tests
[8]. Marron, J., Garcia, M. E., Lefkovitz, N., et al. (2024). Implementing the HIPAA Security Rule (NIST SP 800-66 Rev. 2). National Institute of Standards and Technology. https://csrc.nist.gov/pubs/sp/800/66/r2/final
[9]. Martins, P., Cardoso, F., Váz, P., Silva, J., & Abbasi, M. (2025). Performance and scalability of data cleaning and preprocessing tools: A benchmark on large real-world datasets. Data, 10(5), 68. https://doi.org/10.3390/data10050068
[10]. Lim, H. C., Wong, H., Philip, R., Van Der Vegt, A., Choo, K. R., Pole, J. D., & Sullivan, C. (2025). Streamlining electronic medical record data extraction and validation in digital hospitals: A systematic review to identify optimal approaches and methods. Learning Health Systems, 9(4), e70024. https://doi.org/10.1002/lrh2.70024
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Mehulkumar Joshi

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who submit papers with this journal agree to the following terms.