2019
DOI: 10.29173/istl30
|View full text |Cite
|
Sign up to set email alerts
|

Cleaning Collections Data Using OpenRefine

Abstract: Collection maintenance, including weeding, is a key component of my position as an academic science librarian. In an ideal world we receive perfect data that are clean and ready to use. But unfortunately, that is not always the case. In large deselection projects you might receive holdings and circulation records in separate files which, once combined, may contain many undesired duplicated line items. I will demonstrate how you can effectively and quickly use the facet row feature in OpenRefine to deduplicate … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
3
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 1 publication
(1 reference statement)
0
3
0
Order By: Relevance
“…It is well suited to disambiguation after specimen labels are transcribed, but before they are imported into a collection management system. There are several manuals available on how to use OpenRefine, but Hill (2016) and Sterner (2019) were specifically written for people working on collections.…”
Section: Relevant Informatics Resourcesmentioning
confidence: 99%
“…It is well suited to disambiguation after specimen labels are transcribed, but before they are imported into a collection management system. There are several manuals available on how to use OpenRefine, but Hill (2016) and Sterner (2019) were specifically written for people working on collections.…”
Section: Relevant Informatics Resourcesmentioning
confidence: 99%
“…• OpenRefine: a data cleaning, reconciliation and batch upload tool with a graphical user interface [9];…”
Section: Introductionmentioning
confidence: 99%
“… OpenRefine: Previously Google Refine, this is "a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data." [4].…”
mentioning
confidence: 99%