Michael G. Kahn scite author profile

Objective:Harmonized data quality (DQ) assessment terms, methods, and reporting practices can establish a common understanding of the strengths and limitations of electronic health record (EHR) data for operational analytics, quality improvement, and research. Existing published DQ terms were harmonized to a comprehensive unified terminology with definitions and examples and organized into a conceptual framework to support a common approach to defining whether EHR data is ‘fit’ for specific uses.Materials and Methods:DQ publications, informatics and analytics experts, managers of established DQ programs, and operational manuals from several mature EHR-based research networks were reviewed to identify potential DQ terms and categories. Two face-to-face stakeholder meetings were used to vet an initial set of DQ terms and definitions that were grouped into an overall conceptual framework. Feedback received from data producers and users was used to construct a draft set of harmonized DQ terms and categories. Multiple rounds of iterative refinement resulted in a set of terms and organizing framework consisting of DQ categories, subcategories, terms, definitions, and examples. The harmonized terminology and logical framework’s inclusiveness was evaluated against ten published DQ terminologies.Results:Existing DQ terms were harmonized and organized into a framework by defining three DQ categories: (1) Conformance (2) Completeness and (3) Plausibility and two DQ assessment contexts: (1) Verification and (2) Validation. Conformance and Plausibility categories were further divided into subcategories. Each category and subcategory was defined with respect to whether the data may be verified with organizational data, or validated against an accepted gold standard, depending on proposed context and uses. The coverage of the harmonized DQ terminology was validated by successfully aligning to multiple published DQ terminologies.Discussion:Existing DQ concepts, community input, and expert review informed the development of a distinct set of terms, organized into categories and subcategories. The resulting DQ terms successfully encompassed a wide range of disparate DQ terminologies. Operational definitions were developed to provide guidance for implementing DQ assessment procedures. The resulting structure is an inclusive DQ framework for standardizing DQ assessment and reporting. While our analysis focused on the DQ issues often found in EHR data, the new terminology may be applicable to a wide range of electronic health data such as administrative, research, and patient-reported data.Conclusion:A consistent, common DQ terminology, organized into a logical framework, is an initial step in enabling data owners and users, patients, and policy makers to evaluate and communicate data quality findings in a well-defined manner with a shared vocabulary. Future work will leverage the framework and terminology to develop reusable data quality assessment and reporting methods.

show abstract

A Pragmatic Framework for Single-site and Multisite Data Quality Assessment in Electronic Health Record-based Clinical Research

Kahn

et al. 2012

View full text Add to dashboard Cite

Introduction Answers to clinical and public health research questions increasingly require aggregated data from multiple sites. Data from electronic health records and other clinical sources are useful for such studies, but require stringent quality assessment. Data quality assessment is particularly important in multisite studies to distinguish true variations in care from data quality problems. Methods We propose a “fit-for-use” conceptual model for data quality assessment and a process model for planning and conducting single-site and multisite data quality assessments. These approaches are illustrated using examples from prior multisite studies. Approach Critical components of multisite data quality assessment include: thoughtful prioritization of variables and data quality dimensions for assessment; development and use of standardized approaches to data quality assessment that can improve data utility over time; iterative cycles of assessment within and between sites; targeting assessment toward data domains known to be vulnerable to quality problems; and detailed documentation of the rationale and outcomes of data quality assessments to inform data users. The assessment process requires constant communication between site-level data providers, data coordinating centers, and principal investigators. Discussion A conceptually based and systematically executed approach to data quality assessment is essential to achieve the potential of the electronic revolution in health care. High-quality data allow “learning health care organizations” to analyze and act on their own information, to compare their outcomes to peers, and to address critical scientific questions from the population perspective.

show abstract

PEDSnet: a National Pediatric Learning Health System

Forrest

Margolis

Bailey

et al. 2014

Journal of the American Medical Informatics Association

170

164

View full text Add to dashboard Cite

A learning health system (LHS) integrates research done in routine care settings, structured data capture during every encounter, and quality improvement processes to rapidly implement advances in new knowledge, all with active and meaningful patient participation. While disease-specific pediatric LHSs have shown tremendous impact on improved clinical outcomes, a national digital architecture to rapidly implement LHSs across multiple pediatric conditions does not exist. PEDSnet is a clinical data research network that provides the infrastructure to support a national pediatric LHS. A consortium consisting of PEDSnet, which includes eight academic medical centers, two existing disease-specific pediatric networks, and two national data partners form the initial partners in the National Pediatric Learning Health System (NPLHS). PEDSnet is implementing a flexible dual data architecture that incorporates two widely used data models and national terminology standards to support multi-institutional data integration, cohort discovery, and advanced analytics that enable rapid learning.

show abstract

Transparent Reporting of Data Quality in Distributed Data Networks

Kahn¹,

Brown²,

Chun³

et al. 2015

eGEMs

103

View full text Add to dashboard Cite

Introduction:Poor data quality can be a serious threat to the validity and generalizability of clinical research findings. The growing availability of electronic administrative and clinical data is accompanied by a growing concern about the quality of these data for observational research and other analytic purposes. Currently, there are no widely accepted guidelines for reporting quality results that would enable investigators and consumers to independently determine if a data source is fit for use to support analytic inferences and reliable evidence generation.Model and Methods:We developed a conceptual model that captures the flow of data from data originator across successive data stewards and finally to the data consumer. This “data lifecycle” model illustrates how data quality issues can result in data being returned back to previous data custodians. We highlight the potential risks of poor data quality on clinical practice and research results. Because of the need to ensure transparent reporting of a data quality issues, we created a unifying data-quality reporting framework and a complementary set of 20 data-quality reporting recommendations for studies that use observational clinical and administrative data for secondary data analysis. We obtained stakeholder input on the perceived value of each recommendation by soliciting public comments via two face-to-face meetings of informatics and comparative-effectiveness investigators, through multiple public webinars targeted to the health services research community, and with an open access online wiki.Recommendations:Our recommendations propose reporting on both general and analysis-specific data quality features. The goals of these recommendations are to improve the reporting of data quality measures for studies that use observational clinical and administrative data, to ensure transparency and consistency in computing data quality measures, and to facilitate best practices and trust in the new clinical discoveries based on secondary use of observational data.

show abstract

Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks

2013

View full text Add to dashboard Cite

Background Electronic health information routinely collected during healthcare delivery and reimbursement can help address the need for evidence about the real-world effectiveness, safety, and quality of medical care. Often, distributed networks that combine information from multiple sources are needed to generate this real-world evidence. Objective We provide a set of field-tested best practices and a set of recommendations for data quality checking for comparative effectiveness research (CER) in distributed data networks. Methods Explore the requirements for data quality checking and describe data quality approaches undertaken by several existing multi-site networks. Results There are no established standards regarding how to evaluate the quality of electronic health data for CER within distributed networks. Data checks of increasing complexity are often employed, ranging from consistency with syntactic rules to evaluation of semantics and consistency within and across sites. Temporal trends within and across sites are widely used, as are checks of each data refresh or update. Rates of specific events and exposures by age group, sex, and month are also common. Discussion Secondary use of electronic health data for CER holds promise but is complex, especially in distributed data networks that incorporate periodic data refreshes. The viability of a learning health system is dependent on a robust understanding of the quality, validity, and optimal secondary uses of routinely collected electronic health data within distributed health data networks. Robust data quality checking can strengthen confidence in findings based on distributed data network.

show abstract

Identifying who has long COVID in the USA: a machine learning approach using N3C data

Pfaff

Girvin

Bennett

et al. 2022

The Lancet Digital Health

113

View full text Add to dashboard Cite

A longitudinal analysis of data quality in a large pediatric data research network

Khare

Utidjian

Ruth

et al. 2017

View full text Add to dashboard Cite

show abstract

Multi-Institutional Sharing of Electronic Health Record Data to Assess Childhood Obesity

et al. 2013

View full text Add to dashboard Cite

ObjectiveTo evaluate the validity of multi-institutional electronic health record (EHR) data sharing for surveillance and study of childhood obesity.MethodsWe conducted a non-concurrent cohort study of 528,340 children with outpatient visits to six pediatric academic medical centers during 2007–08, with sufficient data in the EHR for body mass index (BMI) assessment. EHR data were compared with data from the 2007–08 National Health and Nutrition Examination Survey (NHANES).ResultsAmong children 2–17 years, BMI was evaluable for 1,398,655 visits (56%). The EHR dataset contained over 6,000 BMI measurements per month of age up to 16 years, yielding precise estimates of BMI. In the EHR dataset, 18% of children were obese versus 18% in NHANES, while 35% were obese or overweight versus 34% in NHANES. BMI for an individual was highly reliable over time (intraclass correlation coefficient 0.90 for obese children and 0.97 for all children). Only 14% of visits with measured obesity (BMI ≥95%) had a diagnosis of obesity recorded, and only 20% of children with measured obesity had the diagnosis documented during the study period. Obese children had higher primary care (4.8 versus 4.0 visits, p<0.001) and specialty care (3.7 versus 2.7 visits, p<0.001) utilization than non-obese counterparts, and higher prevalence of diverse co-morbidities. The cohort size in the EHR dataset permitted detection of associations with rare diagnoses. Data sharing did not require investment of extensive institutional resources, yet yielded high data quality.ConclusionsMulti-institutional EHR data sharing is a promising, feasible, and valid approach for population health surveillance. It provides a valuable complement to more resource-intensive national surveys, particularly for iterative surveillance and quality improvement. Low rates of obesity diagnosis present a significant obstacle to surveillance and quality improvement for care of children with obesity.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Michael G. Kahn

A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data

A Pragmatic Framework for Single-site and Multisite Data Quality Assessment in Electronic Health Record-based Clinical Research

PEDSnet: a National Pediatric Learning Health System

Transparent Reporting of Data Quality in Distributed Data Networks

Data Quality Assessment for Comparative Effectiveness Research in Distributed Data Networks

Identifying who has long COVID in the USA: a machine learning approach using N3C data

A longitudinal analysis of data quality in a large pediatric data research network

Multi-Institutional Sharing of Electronic Health Record Data to Assess Childhood Obesity

Contact Info

Product

Resources

About