Felix Holzmeister scite author profile

Another social science looks at itself Experimental economists have joined the reproducibility discussion by replicating selected published experiments from two top-tier journals in economics. Camerer et al. found that two-thirds of the 18 studies examined yielded replicable estimates of effect size and direction. This proportion is somewhat lower than unaffiliated experts were willing to bet in an associated prediction market, but roughly in line with expectations from sample sizes and P values. Science , this issue p. 1433

show abstract

Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015

Camerer

et al. 2018

View full text Add to dashboard Cite

Here we provide further details on the replications, the estimation of standardized effect sizes and complementary replicability indicators, the implementation of the prediction markets and surveys, the comparison of prediction market beliefs, survey beliefs, and replication outcomes, the comparison of reproducibility indicators to experimental economics and the psychological sciences, and additional results and data for the individual studies and markets. The code used for the estimation of replication power, standardized effect sizes, all complementary replication indicators, and all results is posted at OSF (https://osf.io/pfdyw/). Replications Inclusion criteriaWe replicated 21 experimental studies in the social sciences published between 2010 and 2015 in Nature and Science. We included all studies that fulfilled our inclusion criteria for:(i) the journal and time period, (ii) the type of experiment, (iii) the subjects included in the experiment, (iv) the equipment and materials needed to implement the experiment, and (v) the results reported in the experiment. We did not exclude studies that had already been subject to a replication, as this could affect the representativity of the included studies. We define and discuss the five inclusion criteria below. Journal and time period: We included experimental studies published in Nature andScience between 2010 and 2015. The reason for focusing on these two journals is that they are typically considered the two most prestigious general science journals. Articles published in these journals are considered exciting, innovative, and important, which is also reflected in their high impact factors. * Number of observations; number of individuals provided in parenthesis. † Replicated; significant effect (p < 0.05) in the same direction as in original study. ‡ Statistical power to detect 50% of the original effect size r. § Relative standardized effect size. * Belief about the probability of replicating in stage 1 (90% power to detect 75% of the original effect size).† Predicted added probability of replicating in stage 2 (90% power to detect 50% of the original effect size) compared to stage 1. * Mean number of tokens (points) invested per transaction. † Mean number of shares bought or sold per transaction.

show abstract

Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015

Camerer¹,

Dreber²,

Holzmeister³

et al. 2018

Preprint

283

434

View full text Add to dashboard Cite

show abstract

Variability in the analysis of a single neuroimaging dataset by many teams

Botvinik‐Nezer

Holzmeister

Camerer

et al. 2020

Nature

687

396

View full text Add to dashboard Cite

Variability in the analysis of a single neuroimaging dataset by many teams

Botvinik‐Nezer

Holzmeister

Camerer

et al. 2019

Preprint

174

209

View full text Add to dashboard Cite

SummaryData analysis workflows in many scientific domains have become increasingly complex and flexible. To assess the impact of this flexibility on functional magnetic resonance imaging (fMRI) results, the same dataset was independently analyzed by 70 teams, testing nine ex-ante hypotheses. The flexibility of analytic approaches is exemplified by the fact that no two teams chose identical workflows to analyze the data. This flexibility resulted in sizeable variation in hypothesis test results, even for teams whose statistical maps were highly correlated at intermediate stages of their analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Importantly, meta-analytic approaches that aggregated information across teams yielded significant consensus in activated regions across teams. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset. Our findings show that analytic flexibility can have substantial effects on scientific conclusions, and demonstrate factors related to variability in fMRI. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for multiple analyses of the same data. Potential approaches to mitigate issues related to analytical variability are discussed.

show abstract

Registered Replication Report on Mazar, Amir, and Ariely (2008)

Verschuere¹,

Meijer²,

Jim³

et al. 2018

Advances in Methods and Practices in Psychological Science

View full text Add to dashboard Cite

The self-concept maintenance theory holds that many people will cheat in order to maximize self-profit, but only to the extent that they can do so while maintaining a positive self-concept. Mazar, Amir, and Ariely (2008, Experiment 1) gave participants an opportunity and incentive to cheat on a problem-solving task. Prior to that task, participants either recalled the Ten Commandments (a moral reminder) or recalled 10 books they had read in high school (a neutral task). Results were consistent with the self-concept maintenance theory. When given the opportunity to cheat, participants given the moral-reminder priming task reported solving 1.45 fewer matrices than did those given a neutral prime (Cohen's d = 0.48); moral reminders reduced cheating. Mazar et al.'s article is among the most cited in deception research, but their Experiment 1 has not been replicated directly. This Registered Replication Report describes the aggregated result of 25 direct replications (total N = 5,786), all of which followed the same preregistered protocol. In the primary meta-analysis (19 replications, total n = 4,674), participants who were given an opportunity

show abstract

Registered Replication Report on Srull and Wyer (1979)

McCarthy¹,

Skowronski²,

Verschuere³

et al. 2018

Advances in Methods and Practices in Psychological Science

View full text Add to dashboard Cite

Srull and Wyer (1979) demonstrated that exposing participants to more hostility-related stimuli caused them subsequently to interpret ambiguous behaviors as more hostile. In their Experiment 1, participants descrambled sets of words to form sentences. In one condition, 80% of the descrambled sentences described hostile behaviors, and in another condition, 20% described hostile behaviors. Following the descrambling task, all participants read a vignette about a man named Donald who behaved in an ambiguously hostile manner and then rated him on a set of personality traits. Next, participants rated the hostility of various ambiguously hostile behaviors (all ratings on scales from 0 to 10). Participants who descrambled mostly hostile sentences rated Donald and the ambiguous behaviors as approximately 3 scale points more hostile than did those who descrambled mostly neutral sentences. This Registered Replication Report describes the results of 26 independent replications (N = 7,373 in the total sample; k = 22 labs and N = 5,610 in the

show abstract

fMRI data of mixed gambles from the Neuroimaging Analysis Replication and Prediction Study

et al. 2019

View full text Add to dashboard Cite

There is an ongoing debate about the replicability of neuroimaging research. It was suggested that one of the main reasons for the high rate of false positive results is the many degrees of freedom researchers have during data analysis. In the Neuroimaging Analysis Replication and Prediction Study (NARPS), we aim to provide the first scientific evidence on the variability of results across analysis teams in neuroscience. We collected fMRI data from 108 participants during two versions of the mixed gambles task, which is often used to study decision-making under risk. For each participant, the dataset includes an anatomical (T1 weighted) scan and fMRI as well as behavioral data from four runs of the task. The dataset is shared through OpenNeuro and is formatted according to the Brain Imaging Data Structure (BIDS) standard. Data pre-processed with fMRIprep and quality control reports are also publicly shared. This dataset can be used to study decision-making under risk and to test replicability and interpretability of previous results in the field.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.