SummaryThere is increasing concern that most current published research findings are false. The probability that a research claim is true may depend on study power and bias, the number of other studies on the same question, and, importantly, the ratio of true to no relationships among the relationships probed in each scientific field. In this framework, a research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance. Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true. Moreover, for many current scientific fields, claimed research findings may often be simply accurate measures of the prevailing bias. In this essay, I discuss the implications of these problems for the conduct and interpretation of research.
Improving the reliability and efficiency of scientific research will increase the credibility of the published scientific literature and accelerate discovery. Here we argue for the adoption of measures to optimize key elements of the scientific process: methods, reporting and dissemination, reproducibility, evaluation and incentives. There is some evidence from both simulations and empirical studies supporting the likely effectiveness of these measures, but their broad adoption by researchers, institutions, funders and journals will require iterative evaluation and improvement. We discuss the goals of these measures, and how they can be implemented, in the hope that this will facilitate action toward improving the transparency, reproducibility and efficiency of scientific research.
We propose to change the default P-value threshold for statistical significance from 0.05 to 0.005 for claims of new discoveries. T he lack of reproducibility of scientific studies has caused growing concern over the credibility of claims of new discoveries based on 'statistically significant' findings. There has been much progress toward documenting and addressing several causes of this lack of reproducibility (for example, multiple testing, P-hacking, publication bias and under-powered studies). However, we believe that a leading cause of non-reproducibility has not yet been adequately addressed: statistical standards of evidence for claiming new discoveries in many fields of science are simply too low. Associating statistically significant findings with P < 0.05 results in a high rate of false positives even in the absence of other experimental, procedural and reporting problems.For fields where the threshold for defining statistical significance for new discoveries is P < 0.05, we propose a change to P < 0.005. This simple step would immediately improve the reproducibility of scientific research in many fields. Results that would currently be called significant but do not meet the new threshold should instead be called suggestive. While statisticians have known the relative weakness of using P ≈ 0.05 as a threshold for discovery and the proposal to lower it to 0.005 is not new 1,2 , a critical mass of researchers now endorse this change.We restrict our recommendation to claims of discovery of new effects. We do not address the appropriate threshold for confirmatory or contradictory replications of existing claims. We also do not advocate changes to discovery thresholds in fields that have already adopted more stringent standards (for example, genomics and high-energy physics research; see the 'Potential objections' section below).We also restrict our recommendation to studies that conduct null hypothesis significance tests. We have diverse views about how best to improve reproducibility, and many of us believe that other ways of summarizing the data, such as Bayes factors or other posterior summaries based on clearly articulated model assumptions, are preferable to P values. However, changing the P value threshold is simple, aligns with the training undertaken by many researchers, and might quickly achieve broad acceptance.
SummaryBackgroundMajor depressive disorder is one of the most common, burdensome, and costly psychiatric disorders worldwide in adults. Pharmacological and non-pharmacological treatments are available; however, because of inadequate resources, antidepressants are used more frequently than psychological interventions. Prescription of these agents should be informed by the best available evidence. Therefore, we aimed to update and expand our previous work to compare and rank antidepressants for the acute treatment of adults with unipolar major depressive disorder.MethodsWe did a systematic review and network meta-analysis. We searched Cochrane Central Register of Controlled Trials, CINAHL, Embase, LILACS database, MEDLINE, MEDLINE In-Process, PsycINFO, the websites of regulatory agencies, and international registers for published and unpublished, double-blind, randomised controlled trials from their inception to Jan 8, 2016. We included placebo-controlled and head-to-head trials of 21 antidepressants used for the acute treatment of adults (≥18 years old and of both sexes) with major depressive disorder diagnosed according to standard operationalised criteria. We excluded quasi-randomised trials and trials that were incomplete or included 20% or more of participants with bipolar disorder, psychotic depression, or treatment-resistant depression; or patients with a serious concomitant medical illness. We extracted data following a predefined hierarchy. In network meta-analysis, we used group-level data. We assessed the studies' risk of bias in accordance to the Cochrane Handbook for Systematic Reviews of Interventions, and certainty of evidence using the Grading of Recommendations Assessment, Development and Evaluation framework. Primary outcomes were efficacy (response rate) and acceptability (treatment discontinuations due to any cause). We estimated summary odds ratios (ORs) using pairwise and network meta-analysis with random effects. This study is registered with PROSPERO, number CRD42012002291.FindingsWe identified 28 552 citations and of these included 522 trials comprising 116 477 participants. In terms of efficacy, all antidepressants were more effective than placebo, with ORs ranging between 2·13 (95% credible interval [CrI] 1·89–2·41) for amitriptyline and 1·37 (1·16–1·63) for reboxetine. For acceptability, only agomelatine (OR 0·84, 95% CrI 0·72–0·97) and fluoxetine (0·88, 0·80–0·96) were associated with fewer dropouts than placebo, whereas clomipramine was worse than placebo (1·30, 1·01–1·68). When all trials were considered, differences in ORs between antidepressants ranged from 1·15 to 1·55 for efficacy and from 0·64 to 0·83 for acceptability, with wide CrIs on most of the comparative analyses. In head-to-head studies, agomelatine, amitriptyline, escitalopram, mirtazapine, paroxetine, venlafaxine, and vortioxetine were more effective than other antidepressants (range of ORs 1·19–1·96), whereas fluoxetine, fluvoxamine, reboxetine, and trazodone were the least efficacious drugs (0·51–0·84). For acceptabil...
The language and conceptual framework of “research reproducibility” are nonstandard and unsettled across the sciences.
In the last 30 years several organizations have developed protocols for clinical validation of blood pressure measuring devices. An international initiative was recently launched by the US Association for the Advancement of Medical Instrumentation (AAMI), the European Society of Hypertension Working Group on Blood Pressure Monitoring (ESH) and the International Organization for Standardization (ISO), aiming to reach consensus on a universal AAMI/ESH/ISO validation standard. The purpose of this statement by the ESH Working Group on Blood Pressure Monitoring is to provide practical guidance for investigators performing validation studies according to the AAMI/ESH/ISO Universal Standard (ISO 81060-2:2018), to ensure that its stipulations are meticulously implemented and data are fully reported. Thus, this statement provides: (i) a list of key recommendations for validation studies of intermittent non-invasive automated blood pressure measuring devices according to the AAMI/ESH/ISO Universal Standard, (ii) practical stepwise guidance for researchers performing these validation studies, (iii) a checklist for authors and reviewers of such studies.
We have empirically assessed the distribution of published effect sizes and estimated power by analyzing 26,841 statistical records from 3,801 cognitive neuroscience and psychology papers published recently. The reported median effect size was D = 0.93 (interquartile range: 0.64–1.46) for nominally statistically significant results and D = 0.24 (0.11–0.42) for nonsignificant results. Median power to detect small, medium, and large effects was 0.12, 0.44, and 0.73, reflecting no improvement through the past half-century. This is so because sample sizes have remained small. Assuming similar true effect sizes in both disciplines, power was lower in cognitive neuroscience than in psychology. Journal impact factors negatively correlated with power. Assuming a realistic range of prior probabilities for null hypotheses, false report probability is likely to exceed 50% for the whole literature. In light of our findings, the recently reported low replication success in psychology is realistic, and worse performance may be expected for cognitive neuroscience.
Psychosis is a heterogeneous psychiatric condition for which a multitude of risk and protective factors have been suggested. This umbrella review aimed to classify the strength of evidence for the associations between each factor and psychotic disorders whilst controlling for several biases. The Web of Knowledge database was searched to identify systematic reviews and meta-analyses of observational studies which examined associations between socio-demographic, parental, perinatal, later factors or antecedents and psychotic disorders, and which included a comparison group of healthy controls, published from 1965 to January 31, 2017. The literature search and data extraction followed PRISMA and MOOSE guidelines. The association between each factor and ICD or DSM diagnoses of non-organic psychotic disorders was graded into convincing, highly suggestive, suggestive, weak, or non-significant according to a standardized classification based on: number of psychotic cases, random-effects p value, largest study 95% confidence interval, heterogeneity between studies, 95% prediction interval, small study effect, and excess significance bias. In order to assess evidence for temporality of association, we also conducted sensitivity analyses restricted to data from prospective studies. Fifty-five meta-analyses or systematic reviews were included in the umbrella review, corresponding to 683 individual studies and 170 putative risk or protective factors for psychotic disorders. Only the ultra-high-risk state for psychosis (odds ratio, OR59.32, 95% CI: 4.91-17.72) and Black-Caribbean ethnicity in England (OR54.87, 95% CI: 3.96-6.00) showed convincing evidence of association. Six factors were highly suggestive (ethnic minority in low ethnic density area, second generation immigrants, trait anhedonia, premorbid IQ, minor physical anomalies, and olfactory identification ability), and nine were suggestive (urbanicity, ethnic minority in high ethnic density area, first generation immigrants, North-African immigrants in Europe, winter/spring season of birth in Northern hemisphere, childhood social withdrawal, childhood trauma, Toxoplasma gondii IgG, and non-right handedness). When only prospective studies were considered, the evidence was convincing for ultra-high-risk state and suggestive for urbanicity only. In summary, this umbrella review found several factors to be associated with psychotic disorders with different levels of evidence. These risk or protective factors represent a starting point for further etiopathological research and for the improvement of the prediction of psychosis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.