Alberto Pepe scite author profile

The arXiv is the most popular preprint repository in the world. Since its inception in 1991, the arXiv has allowed researchers to freely share publication-ready articles prior to formal peer review. The growth and the popularity of the arXiv emerged as a result of new technologies that made document creation and dissemination easy, and cultural practices where collaboration and data sharing were dominant. The arXiv represents a unique place in the history of research communication and the Web itself, however it has arguably changed very little since its creation. Here we look at the strengths and weaknesses of arXiv in an effort to identify what possible improvements can be made based on new technologies not previously available. Based on this, we argue that a modern arXiv might in fact not look at all like the arXiv of today.Disclaimer: This article has originally been written and posted on Authorea, a collaborative online platform for technical and data-driven documents. Authorea is being developed to respond to some of the concerns with current methodology raised in this very piece, and as such is suggested as a possible future alternative to existing preprint servers.

show abstract

How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions, and Citations

Shuai

2012

View full text Add to dashboard Cite

We analyze the online response to the preprint publication of a cohort of 4,606 scientific articles submitted to the preprint database arXiv.org between October 2010 and May 2011. We study three forms of responses to these preprints: downloads on the arXiv.org site, mentions on the social media site Twitter, and early citations in the scholarly record. We perform two analyses. First, we analyze the delay and time span of article downloads and Twitter mentions following submission, to understand the temporal configuration of these reactions and whether one precedes or follows the other. Second, we run regression and correlation tests to investigate the relationship between Twitter mentions, arXiv downloads, and article citations. We find that Twitter mentions and arXiv downloads of scholarly articles follow two distinct temporal patterns of activity, with Twitter mentions having shorter delays and narrower time spans than arXiv downloads. We also find that the volume of Twitter mentions is statistically correlated with arXiv downloads and early citations just months after the publication of a preprint, with a possible bias that favors highly mentioned articles.

show abstract

Ten Simple Rules for the Care and Feeding of Scientific Data

et al. 2014

View full text Add to dashboard Cite

Know Thy Sensor: Trust, Data Quality, and Data Integrity in Scientific Digital Libraries

Wallis

Borgman

Mayernik

et al.

View full text Add to dashboard Cite

Abstract. For users to trust and interpret the data in scientific digital libraries, they must be able to assess the integrity of those data. Criteria for data integrity vary by context, by scientific problem, by individual, and a variety of other factors. This paper compares technical approaches to data integrity with scientific practices, as a case study in the Center for Embedded Networked Sensing (CENS) in the use of wireless, in-situ sensing for the collection of large scientific data sets. The goal of this research is to identify functional requirements for digital libraries of scientific data that will serve to bridge the gap between current technical approaches to data integrity and existing scientific practices.

show abstract

Moving Archival Practices Upstream: An Exploration of the Life Cycle of Ecological Sensing Data in Collaborative Field Research

Wallis¹,

Borgman²,

Mayernik³

et al. 2008

IJDC

View full text Add to dashboard Cite

The success of eScience research depends not only upon effective collaboration between scientists and technologists but also upon the active involvement of data archivists. Archivists rarely receive scientific data until findings are published, by which time important information about their origins, context, and provenance may be lost. Research reported here addresses the life cycle of data from collaborative ecological research with embedded networked sensing technologies. A better understanding of these processes will enable archivists to participate in earlier stages of the life cycle and to improve curation of these types of scientific data. Evidence from our interview study and field research yields a nine-stage life cycle. Among the findings are the cumulative effect of decisions made at each stage of the life cycle; the balance of decision-making between scientific and technology research partners; and the loss of certain types of data that may be essential to later interpretation.

show abstract

A Measure of Total Research Impact Independent of Time and Discipline

Pepe

Kurtz

2012

PLoS ONE

View full text Add to dashboard Cite

Authorship and citation practices evolve with time and differ by academic discipline. As such, indicators of research productivity based on citation records are naturally subject to historical and disciplinary effects. We observe these effects on a corpus of astronomer career data constructed from a database of refereed publications. We employ a simple mechanism to measure research output using author and reference counts available in bibliographic databases to develop a citation-based indicator of research productivity. The total research impact (tori) quantifies, for an individual, the total amount of scholarly work that others have devoted to his/her work, measured in the volume of research papers. A derived measure, the research impact quotient (riq), is an age-independent measure of an individual's research ability. We demonstrate that these measures are substantially less vulnerable to temporal debasement and cross-disciplinary bias than the most popular current measures. The proposed measures of research impact, tori and riq, have been implemented in the Smithsonian/NASA Astrophysics Data System.

show abstract

How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers

et al. 2014

View full text Add to dashboard Cite

We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at theastrodata.org, and we analyze the uptake of that system to-date.

show abstract

On the relationship between the structural and socioacademic communities of a coauthorship network

Rodriguez

Pepe

2008

Journal of Informetrics

View full text Add to dashboard Cite

a b s t r a c tThis article presents a study that compares detected structural communities in a coauthorship network to the socioacademic characteristics of the scholars that compose the network. The coauthorship network was created from the bibliographic record of a multiinstitution, interdisciplinary research group focused on the study of sensor networks and wireless communication. Four different community detection algorithms were employed to assign a structural community to each scholar in the network: leading eigenvector, walktrap, edge betweenness and spinglass. Socioacademic characteristics were gathered from the scholars and include such information as their academic department, academic affiliation, country of origin, and academic position. A Pearson's 2 test, with a simulated Monte Carlo, revealed that structural communities best represent groupings of individuals working in the same academic department and at the same institution. A generalization of this result suggests that, even in interdisciplinary, multi-institutional research groups, coauthorship is primarily driven by departmental and institutional affiliation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Alberto Pepe

The arXiv of the future will not look like the arXiv

How the Scientific Community Reacts to Newly Submitted Preprints: Article Downloads, Twitter Mentions, and Citations

Ten Simple Rules for the Care and Feeding of Scientific Data

Know Thy Sensor: Trust, Data Quality, and Data Integrity in Scientific Digital Libraries

Moving Archival Practices Upstream: An Exploration of the Life Cycle of Ecological Sensing Data in Collaborative Field Research

A Measure of Total Research Impact Independent of Time and Discipline

How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers

On the relationship between the structural and socioacademic communities of a coauthorship network

Contact Info

Product

Resources

About