Griffin M Weber scite author profile

We have generated a molecular taxonomy of lung carcinoma, the leading cause of cancer death in the United States and worldwide. Using oligonucleotide microarrays, we analyzed mRNA expression levels corresponding to 12,600 transcript sequences in 186 lung tumor samples, including 139 adenocarcinomas resected from the lung. Hierarchical and probabilistic clustering of expression data defined distinct subclasses of lung adenocarcinoma. Among these were tumors with high relative expression of neuroendocrine genes and of type II pneumocyte genes, respectively. Retrospective analysis revealed a less favorable outcome for the adenocarcinomas with neuroendocrine gene expression. The diagnostic potential of expression profiling is emphasized by its ability to discriminate primary lung adenocarcinomas from metastases of extra-pulmonary origin. These results suggest that integration of expression profile data with clinical parameters could aid in diagnosis of lung cancer patients.

show abstract

BioNumbers—the database of key numbers in molecular and cell biology

Milo

et al. 2009

View full text Add to dashboard Cite

BioNumbers (http://www.bionumbers.hms.harvard.edu) is a database of key numbers in molecular and cell biology—the quantitative properties of biological systems of interest to computational, systems and molecular cell biologists. Contents of the database range from cell sizes to metabolite concentrations, from reaction rates to generation times, from genome sizes to the number of mitochondria in a cell. While always of importance to biologists, having numbers in hand is becoming increasingly critical for experimenting, modeling, and analyzing biological systems. BioNumbers was motivated by an appreciation of how long it can take to find even the simplest number in the vast biological literature. All numbers are taken directly from a literature source and that reference is provided with the number. BioNumbers is designed to be highly searchable and queries can be performed by keywords or browsed by menus. BioNumbers is a collaborative community platform where registered users can add content and make comments on existing data. All new entries and commentary are curated to maintain high quality. Here we describe the database characteristics and implementation, demonstrate its use, and discuss future directions for its development.

show abstract

Genomic Analysis of Mouse Retinal Development

et al. 2004

View full text Add to dashboard Cite

The vertebrate retina is comprised of seven major cell types that are generated in overlapping but well-defined intervals. To identify genes that might regulate retinal development, gene expression in the developing retina was profiled at multiple time points using serial analysis of gene expression (SAGE). The expression patterns of 1,051 genes that showed developmentally dynamic expression by SAGE were investigated using in situ hybridization. A molecular atlas of gene expression in the developing and mature retina was thereby constructed, along with a taxonomic classification of developmental gene expression patterns. Genes were identified that label both temporal and spatial subsets of mitotic progenitor cells. For each developing and mature major retinal cell type, genes selectively expressed in that cell type were identified. The gene expression profiles of retinal Müller glia and mitotic progenitor cells were found to be highly similar, suggesting that Müller glia might serve to produce multiple retinal cell types under the right conditions. In addition, multiple transcripts that were evolutionarily conserved that did not appear to encode open reading frames of more than 100 amino acids in length (“noncoding RNAs”) were found to be dynamically and specifically expressed in developing and mature retinal cell types. Finally, many photoreceptor-enriched genes that mapped to chromosomal intervals containing retinal disease genes were identified. These data serve as a starting point for functional investigations of the roles of these genes in retinal development and physiology.

show abstract

Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)

Murphy

Weber

Mendis

et al. 2010

Journal of the American Medical Informatics Association

739

586

View full text Add to dashboard Cite

Informatics for Integrating Biology and the Bedside (i2b2) is one of seven projects sponsored by the NIH Roadmap National Centers for Biomedical Computing (http://www.ncbcs.org). Its mission is to provide clinical investigators with the tools necessary to integrate medical record and clinical research data in the genomics age, a software suite to construct and integrate the modern clinical research chart. i2b2 software may be used by an enterprise's research community to find sets of interesting patients from electronic patient medical record data, while preserving patient privacy through a query tool interface. Project-specific mini-databases ("data marts") can be created from these sets to make highly detailed data available on these specific patients to the investigators on the i2b2 platform, as reviewed and restricted by the Institutional Review Board. The current version of this software has been released into the public domain and is available at the URL: http://www.i2b2.org/software.

show abstract

The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment

Haendel¹,

Chute²,

Bennett³

et al. 2020

363

437

View full text Add to dashboard Cite

Objective COVID-19 poses societal challenges that require expeditious data and knowledge sharing. Though organizational clinical data are abundant, these are largely inaccessible to outside researchers. Statistical, machine learning, and causal analyses are most successful with large-scale data beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many centers. Methods The Clinical and Translational Science Award (CTSA) Program and scientific community created N3C to overcome technical, regulatory, policy, and governance barriers to sharing and harmonizing individual-level clinical data. We developed solutions to extract, aggregate, and harmonize data across organizations and data models, and created a secure data enclave to enable efficient, transparent, and reproducible collaborative analytics. Organized in inclusive workstreams, in two months we created: legal agreements and governance for organizations and researchers; data extraction scripts to identify and ingest positive, negative, and possible COVID-19 cases; a data quality assurance and harmonization pipeline to create a single harmonized dataset; population of the secure data enclave with data, machine learning, and statistical analytics tools; dissemination mechanisms; and a synthetic data pilot to democratize data access. Discussion The N3C has demonstrated that a multi-site collaborative learning health network can overcome barriers to rapidly build a scalable infrastructure incorporating multi-organizational clinical data for COVID-19 analytics. We expect this effort to save lives by enabling rapid collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care and thereby reduce the immediate and long-term impacts of COVID-19. LAY SUMMARY COVID-19 poses societal challenges that require expeditious data and knowledge sharing. Though medical records are abundant, they are largely inaccessible to outside researchers. Statistical, machine learning, and causal research are most successful with large datasets beyond what is available in any given organization. Here, we introduce the National COVID Cohort Collaborative (N3C), an open science community focused on analyzing patient-level data from many clinical centers to reveal patterns in COVID-19 patients. To create N3C, the community had to overcome technical, regulatory, policy, and governance barriers to sharing patient-level clinical data. In less than 2 months, we developed solutions to acquire and harmonize data across organizations and created a secure data environment to enable transparent and reproducible collaborative research. We expect the N3C to help save lives by enabling collaboration among clinicians, researchers, and data scientists to identify treatments and specialized care needs and thereby reduce the immediate and long-term impacts of COVID-19.

show abstract

The Co-Morbidity Burden of Children and Young Adults with Autism Spectrum Disorders

et al. 2012

View full text Add to dashboard Cite

ObjectivesUse electronic health records Autism Spectrum Disorder (ASD) to assess the comorbidity burden of ASD in children and young adults.Study DesignA retrospective prevalence study was performed using a distributed query system across three general hospitals and one pediatric hospital. Over 14,000 individuals under age 35 with ASD were characterized by their co-morbidities and conversely, the prevalence of ASD within these comorbidities was measured. The comorbidity prevalence of the younger (Age<18 years) and older (Age 18–34 years) individuals with ASD was compared.Results19.44% of ASD patients had epilepsy as compared to 2.19% in the overall hospital population (95% confidence interval for difference in percentages 13.58–14.69%), 2.43% of ASD with schizophrenia vs. 0.24% in the hospital population (95% CI 1.89–2.39%), inflammatory bowel disease (IBD) 0.83% vs. 0.54% (95% CI 0.13–0.43%), bowel disorders (without IBD) 11.74% vs. 4.5% (95% CI 5.72–6.68%), CNS/cranial anomalies 12.45% vs. 1.19% (95% CI 9.41–10.38%), diabetes mellitus type I (DM1) 0.79% vs. 0.34% (95% CI 0.3–0.6%), muscular dystrophy 0.47% vs 0.05% (95% CI 0.26–0.49%), sleep disorders 1.12% vs. 0.14% (95% CI 0.79–1.14%). Autoimmune disorders (excluding DM1 and IBD) were not significantly different at 0.67% vs. 0.68% (95% CI −0.14-0.13%). Three of the studied comorbidities increased significantly when comparing ages 0–17 vs 18–34 with p<0.001: Schizophrenia (1.43% vs. 8.76%), diabetes mellitus type I (0.67% vs. 2.08%), IBD (0.68% vs. 1.99%) whereas sleeping disorders, bowel disorders (without IBD) and epilepsy did not change significantly.ConclusionsThe comorbidities of ASD encompass disease states that are significantly overrepresented in ASD with respect to even the patient populations of tertiary health centers. This burden of comorbidities goes well beyond those routinely managed in developmental medicine centers and requires broad multidisciplinary management that payors and providers will have to plan for.

show abstract

Biases in electronic health record data due to processes within the healthcare system: retrospective observational study

Agniel

Kohane

Weber

2018

BMJ

255

231

View full text Add to dashboard Cite

ObjectiveTo evaluate on a large scale, across 272 common types of laboratory tests, the impact of healthcare processes on the predictive value of electronic health record (EHR) data.DesignRetrospective observational study.SettingTwo large hospitals in Boston, Massachusetts, with inpatient, emergency, and ambulatory care.ParticipantsAll 669 452 patients treated at the two hospitals over one year between 2005 and 2006.Main outcome measuresThe relative predictive accuracy of each laboratory test for three year survival, using the time of the day, day of the week, and ordering frequency of the test, compared to the value of the test result.ResultsThe presence of a laboratory test order, regardless of any other information about the test result, has a significant association (P<0.001) with the odds of survival in 233 of 272 (86%) tests. Data about the timing of when laboratory tests were ordered were more accurate than the test results in predicting survival in 118 of 174 tests (68%).ConclusionsHealthcare processes must be addressed and accounted for in analysis of observational health data. Without careful consideration to context, EHR data are unsuitable for many research questions. However, if explicitly modeled, the same processes that make EHR data complex can be leveraged to gain insight into patients’ state of health.

show abstract

Finding the Missing Link for Big Biomedical Data

Weber

Mandl

Kohane

2014

JAMA

228

221

View full text Add to dashboard Cite

their data publicly only to regret it later when those data were used in unanticipated circumstances. To avoid paternalism, is there an effective and affordable mechanism, analogous to consent for participation in a trial, to enable patients to decide how and when their data can be shared with or "mashed up" against other databases? It may therefore be timely to convene a public forum whereby the relevant stakeholders, including citizens, the health care community, and commercial data vendors could meet to frame the policy from which legislation and ultimately technical protections for big biomedical data linkage will devolve.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Griffin M Weber

Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses

BioNumbers—the database of key numbers in molecular and cell biology

Genomic Analysis of Mouse Retinal Development

Serving the enterprise and beyond with informatics for integrating biology and the bedside (i2b2)

The National COVID Cohort Collaborative (N3C): Rationale, design, infrastructure, and deployment

The Co-Morbidity Burden of Children and Young Adults with Autism Spectrum Disorders

Biases in electronic health record data due to processes within the healthcare system: retrospective observational study

Finding the Missing Link for Big Biomedical Data

Contact Info

Product

Resources

About