Test Collection Based Evaluation of Information Retrieval Systems

Sanderson, Mark

doi:10.1561/1500000009

Cited by 333 publications

(206 citation statements)

References 205 publications

Supporting

Mentioning

199

Contrasting

Unclassified

Order By: Relevance

“…Such aggregates are learned using learning to rank methods. Online learning to rank methods learn from user interactions such as clicks [4,6,10,12]. Dueling Bandit Gradient Descent [16,DBGD] uses interleaved comparison methods [1,3,6,7,10] to infer preferences and then learns by following a gradient that is meant to lead to an optimal ranker.…”

Section: Introductionmentioning

confidence: 99%

Probabilistic Multileave Gradient Descent

Oosterhuis

Schuth

Rijke

2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Abstract. Online learning to rank methods aim to optimize ranking models based on user interactions. The dueling bandit gradient descent (DBGD) algorithm is able to effectively optimize linear ranking models solely from user interactions. We propose an extension of DBGD, called probabilistic multileave gradient descent (P-MGD) that builds on probabilistic multileave, a recently proposed highly sensitive and unbiased online evaluation method. We demonstrate that P-MGD significantly outperforms state-of-the-art online learning to rank methods in terms of online performance, without sacrificing offline performance and at greater learning speed.

show abstract

Section: Introductionmentioning

confidence: 99%

Probabilistic Multileave Gradient Descent

Oosterhuis

Schuth

Rijke

2016

Lecture Notes in Computer Science

View full text Add to dashboard Cite

show abstract

“…While it would in theory be possible to provide the ground truth for the relevance of each document to the test queries in step 1, this would in practice require infeasible amounts of human input. In practice, the human input for the relevance judgements is provided in step 6, where relevance judgements are only done on documents returned by at least one algorithm, usually involving a com technique such as pooling to further reduce the number of relevance judgements to be made [17]. For applications that in practice involve the processing and analysis of large amounts of data, running benchmarks of the algorithms on representative amounts of data has advantages.…”

Section: Challenges In Benchmarking On Big Datamentioning

confidence: 99%

Cloud–Based Evaluation Framework for Big Data

Hanbury

Müller

Menze

2013

The Future Internet

View full text Add to dashboard Cite

Abstract. The VISCERAL project is building a cloud-based evaluation framework for evaluating machine learning and information retrieval algorithms on large amounts of data. Instead of downloading data and running evaluations locally, the data will be centrally available on the cloud and algorithms to be evaluated will be programmed in computing instances on the cloud, effectively bringing the algorithms to the data. This approach allows evaluations to be performed on Terabytes of data without needing to consider the logistics of moving the data or storing the data on local infrastructure. After discussing the challenges of benchmarking on big data, the design of the VISCERAL system is presented, concentrating on the components for coordinating the participants in the benchmark and managing the ground truth creation. The first two benchmarks run on the VISCERAL framework will be on segmentation and retrieval of 3D medical images.

show abstract

“…Ma et al [123] use "live" WWW. Sanderson [150] surveys the most general issues of Cranfield style-based evaluation. Shen et al [156] use TREC collections.…”

Section: Concluding Remarks and Suggestionsmentioning

confidence: 99%

Contextual Search: A Computational Framework

Melucci

2012

FNT in Information Retrieval

View full text Add to dashboard Cite

Test Collection Based Evaluation of Information Retrieval Systems

Cited by 333 publications

References 205 publications

Probabilistic Multileave Gradient Descent

Probabilistic Multileave Gradient Descent

Cloud–Based Evaluation Framework for Big Data

Contextual Search: A Computational Framework

Contact Info

Product

Resources

About