The US National Library of Medicine regularly collects summary data on direct use of Unified Medical Language System (UMLS) resources. The summary data sources include UMLS user registration data, required annual reports submitted by registered users, and statistics on downloads and application programming interface calls. In 2019, the National Library of Medicine analyzed the summary data on 2018 UMLS use. The library also conducted a scoping review of the literature to provide additional intelligence about the research uses of UMLS as input to a planned 2020 review of UMLS production methods and priorities. 5043 direct users of UMLS data and tools downloaded 4402 copies of the UMLS resources and issued 66 130 951 UMLS application programming interface requests in 2018. The annual reports and the scoping review results agree that the primary UMLS uses are to process and interpret text and facilitate mapping or linking between terminologies. These uses align with the original stated purpose of the UMLS.
As librarians are generally advocates of open access and data sharing, it is a bit surprising that peer-reviewed journals in the field of librarianship have been slow to adopt data sharing policies. Starting October 1, 2019, the Journal of the Medical Library Association (JMLA) is taking a step forward and implementing a firm data sharing policy to increase the rigor and reproducibility of published research, enable data reuse, and promote open science. This editorial explains the data sharing policy, describes how compliance with the policy will fit into the journal’s workflow, and provides further guidance for preparing for data sharing.
Providing access to the data underlying research results in published literature allows others to reproduce those results or analyze the data in new ways. Health sciences librarians and information professionals have long been advocates of data sharing. It is time for us to practice what we preach and share the data associated with our published research. This editorial describes the activity of a working group charged with developing a research data sharing policy for the Journal of the Medical Library Association.
BackgroundGiven the limited supply of two COVID-19 vaccines, it will be important to choose which risk groups to prioritize for vaccination in order to get the most health benefits from that supply.MethodIn order to help decide how to get the maximum health yield from this limited supply, we implemented a logistic regression model to predict COVID-19 death risk by age, race, and sex and did the same to predict COVID-19 case risk.ResultsOur predictive model ranked all demographic groups by COVID-19 death risk. It was highly concentrated in some demographic groups, e.g. 85+ year old Black, Non-Hispanic patients suffered 1,953 deaths per 100,000. If we vaccinated the 17 demographic groups at highest COVID-19 death ranked by our logistic model, it would require only 3.7% of the vaccine supply needed to vaccinate all the United States, and yet prevent 47% of COVID-19 deaths. Nursing home residents had a higher COVID-19 death risk at 5,200 deaths/100,000, more than our highest demographic risk group. Risk of prison residents and health care workers (HCW) were lower than that of our demographic groups with the highest risks.We saw much less concentration of COVID-19 case risk in any demographic groups compared to the high concentration of COVID-19 death in some such groups. We should prioritize vaccinations with the goal of reducing deaths, not cases, while the vaccine supply is low.ConclusionSARS-CoV-2 vaccines protect against severe COVID-19 infection and thus against COVID-19 death per vaccine studies. Allocating at least some of the early vaccine supplies to high risk demographic groups could maximize lives saved. Our model, and the risk estimate it produced, could help states define their vaccine allocation rules.
Objectives To access the accuracy of the Logical Observation Identifiers Names and Codes (LOINC) mapping to local laboratory test codes that is crucial to data integration across time and healthcare systems. Materials and Methods We used software tools and manual reviews to estimate the rate of LOINC mapping errors among 179 million mapped test results from 2 DataMarts in PCORnet. We separately reported unweighted and weighted mapping error rates, overall and by parts of the LOINC term. Results Of included 179 537 986 mapped results for 3029 quantitative tests, 95.4% were mapped correctly implying an 4.6% mapping error rate. Error rates were less than 5% for the more common tests with at least 100 000 mapped test results. Mapping errors varied across different LOINC classes. Error rates in chemistry and hematology classes, which together accounted for 92.0% of the mapped test results, were 0.4% and 7.5%, respectively. About 50% of mapping errors were due to errors in the property part of the LOINC name. Discussions Mapping errors could be detected automatically through inconsistencies in (1) qualifiers of the analyte, (2) specimen type, (3) property, and (4) method. Among quantitative test results, which are the large majority of reported tests, application of automatic error detection and correction algorithm could reduce the mapping errors further. Conclusions Overall, the mapping error rate within the PCORnet data was 4.6%. This is nontrivial but less than other published error rates of 20%–40%. Such error rate decreased substantially to 0.1% after the application of automatic detection and correction algorithm.
Introduction: The National Institute of Neurological Disorders and Stroke (NINDS) and the National Library of Medicine (NLM) initiated development of cerebral aneurysms and subarachnoid hemorrhage (SAH)-specific Common Data Elements (CDEs) in 2015 as part of a joint project to develop data standards for funded clinical research in neuroscience. Objective: Through the development of these data standards for clinical research, the NINDS and NLM SAH joint CDE initiative strives to improve SAH data collection by increasing efficiency, improving data quality, reducing study start-up time, facilitating data sharing/meta-analyses and helping educate new clinical investigators. Methods: The working group consisted of international members with varied fields of expertise related to SAH and was divided into domains such as subject characteristics and assessments and exams. They developed a set of SAH-specific CDE recommendations by selecting among, refining and adding to existing field-tested data elements. Recommendations, based on reviewing the established Stroke CDEs as well as other disease-specific CDEs, were uploaded to the NIH CDE Repository. Following an internal working group review of recommendations, the SAH CDEs will be vetted during a public review on the NINDS website. Results: Version 1.0 of the SAH CDEs will be available in early 2017. New SAH CDEs and recommendations will include those developed for unruptured intracranial aneurysms and long-term therapies. The NINDS CDE website provides uniform names and structures for each data element, as well as guidance documents and template case report forms using the CDEs. Conclusion: The NINDS encourages the use of CDEs by the clinical research community in order to standardize the collection of research data across studies. The NINDS CDEs are a continually evolving resource, requiring updates as research advancements indicate. These newly developed SAH CDEs will serve to be a valuable starting point for researchers and facilitate streamlining and sharing data. Information provided at this meeting will include examples of how the SAH CDEs may be used by a research study, demonstrations of navigating the NINDS CDE and NIH CDE Repository websites and how users can submit feedback.
Data sharing is critical to advance genomic research by reducing the demand to collect new data by reusing and combining existing data and by promoting reproducible research. The Cancer Genome Atlas (TCGA) is a popular resource for individual-level genotype-phenotype cancer related data. The Database of Genotypes and Phenotypes (dbGaP) contains many datasets similar to those in TCGA. We have created a software pipeline that will allow researchers to discover relevant genomic data from dbGaP, based on matching TCGA metadata. The resulting research provides an easy to use tool to connect these two data sources.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.