Research on bias in peer review examines scholarly communication and funding processes to assess the epistemic and social legitimacy of the mechanisms by which knowledge communities vet and self‐regulate their work. Despite vocal concerns, a closer look at the empirical and methodological limitations of research on bias raises questions about the existence and extent of many hypothesized forms of bias. In addition, the notion of bias is predicated on an implicit ideal that, once articulated, raises questions about the normative implications of research on bias in peer review. This review provides a brief description of the function, history, and scope of peer review; articulates and critiques the conception of bias unifying research on bias in peer review; characterizes and examines the empirical, methodological, and normative claims of bias in peer review research; and assesses possible alternatives to the status quo. We close by identifying ways to expand conceptions and studies of bias to contend with the complexity of social interactions among actors involved directly and indirectly in peer review.
Previous research has found that funding disparities are driven by applications’ final impact scores and that only a portion of the black/white funding gap can be explained by bibliometrics and topic choice. Using National Institutes of Health R01 applications for council years 2014–2016, we examine assigned reviewers’ preliminary overall impact and criterion scores to evaluate whether racial disparities in impact scores can be explained by application and applicant characteristics. We hypothesize that differences in commensuration—the process of combining criterion scores into overall impact scores—disadvantage black applicants. Using multilevel models and matching on key variables including career stage, gender, and area of science, we find little evidence for racial disparities emerging in the process of combining preliminary criterion scores into preliminary overall impact scores. Instead, preliminary criterion scores fully account for racial disparities—yet do not explain all of the variability—in preliminary overall impact scores.
Publishers must invest, and manage risk
An empirically sensitive formulation of the norms of transformative criticism must recognize that even public and shared standards of evaluation can be implemented in ways that unintentionally perpetuate and reproduce forms of social bias that are epistemically detrimental. Helen Longino's theory can explain and redress such social bias by treating peer evaluations as hypotheses based on data and by requiring a kind of perspectival diversity that bears, not on the content of the community's knowledge claims, but on the beliefs and norms of the culture of the knowledge community itself. To illustrate how socializing cognition can bias evaluations, we focus on peer-review practices, with some discussion of peer-review practices in philosophy. Data include responses to surveys by editors from general philosophy journals, as well as analyses of reviews and editorial decisions for the 2007 Cognitive Science Society Conference.
Psychometrically oriented researchers construe low interrater reliability measures for expert peer reviewers as damning for the practice of peer review. I argue that this perspective overlooks different forms of normatively appropriate disagreement among reviewers. Of special interest are Kuhnian questions about the extent to which variance in reviewer ratings can be accounted for by normatively appropriate disagreements about how to interpret and apply evaluative criteria within disciplines during times of normal science. Until these empirical-cum-philosophical analyses are done, it will remain unclear the extent to which low interrater reliability measures represent reasonable disagreement rather than arbitrary differences between reviewers.
Considerable attention has focused on studying reviewer agreement via inter-rater reliability (IRR) as a way to assess the quality of the peer review process. Inspired by a recent study that reported an IRR of zero in the mock peer review of top-quality grant proposals, we use real data from a complete range of submissions to the National Institutes of Health and to the American Institute of Biological Sciences to bring awareness to two important issues with using IRR for assessing peer review quality. First, we demonstrate that estimating local IRR from subsets of restricted-quality proposals will likely result in zero estimates under many scenarios.In both data sets, we find that zero local IRR estimates are more likely when subsets of top-quality proposals rather than bottom-quality proposals are considered. However, zero estimates from range-restricted data should not be interpreted as indicating arbitrariness in peer review. On the contrary, despite different scoring scales used by the two agencies, when complete ranges of proposals are considered, IRR estimates are above 0.6 which indicates good reviewer agreement. Furthermore, we demonstrate that, with a small number of reviewers per proposal, zero estimates of IRR are possible even when the true value is not zero.This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.
in study design, statistics, or grant writing) for non-PhD physicians or scientists might benefit this group. The association between time allocated to research and award attainment suggests that identifying ways to guarantee protected research time could enhance funding outcomes.To increase gender equity in the selection process, the CSDA request for applications and review guideline documents were revised in September, 2016, to clearly articulate attributes held by successful applicants and evaluation criteria that used objective, non-gendered language. These materials were revised to minimise use of words that are thought to be implicitly associated with traditionally masculine traits. 7 For example, the phrase "leadership potential" was changed to "promise to make significant contributions", "importance" to "influence", "innovation" to "originality", and "creativity" to "inventiveness." Magua and colleagues 8 have confirmed these types of gendered associations of certain words in peer review.We also attempted to use the application to encourage institutions to consider gender equity in applicant salaries. Department chairs were asked to provide the applicant's salary quartile range relative to those at the same faculty rank in the department. This question was not used in the application review. Applicants' salary quartiles showed a gender gap between women and men who entered the competition (appendix) that was consistent with gender gaps in physician-scientist compensation in previous reports. 2 Anecdotally, one female applicant reported that the salary question raised awareness about her low compensation, which prompted a sizable salary adjustment. In June, 2017, the difference in salaries by gender was shared with the department chairs who contributed the data. The gender salary gap was narrower in 2018, which was the second year that this information was collected.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.