The paper describes the ideas and assumptions underlying the development of a new method for the evaluation and testing of interactive information retrieval (IR) systems, and reports on the initial tests of the proposed method. The method is designed to collect different types of empirical data, i.e. cognitive data as well as traditional systems performance data. The method is based on the novel concept of a 'simulated work task situation' or scenario and the involvement of real end users. The method is also based on a mixture of simulated and real information needs, and involves a group of test persons as well as assessments made by individual panel members. The relevance assessments are made with reference to the concepts of topical as well as situational relevance. The method takes into account the dynamic nature of information needs which are assumed to develop over time for the same user, a variability which is presumed to be strongly connected to the processes of relevance assessment.
This article introduces the concept of relevance as viewed and applied in the context of IR evaluation, by presenting an overview of the multidimensional and dynamic nature of the concept. The literature on relevance reveals how the relevance concept, especially in regard to the multidimensionality of relevance, is many faceted, and does not just refer to the various relevance criteria users may apply in the process of judging relevance of retrieved information objects. From our point of view, the multidimensionality of relevance explains why some will argue that no consensus has been reached on the relevance concept. Thus, the objective of this article is to present an overview of the many different views and ways by which the concept of relevance is used-leading to a consistent and compatible understanding of the concept. In addition, special attention is paid to the type of situational relevance. Many researchers perceive situational relevance as the most realistic type of user relevance, and therefore situational relevance is discussed with reference to its potential dynamic nature, and as a requirement for interactive information retrieval (IIR) evaluation.
This paper presents a set of basic components which constitutes the experimental setting intended for the evaluation of interactive information retrieval (IIR) systems, the aim of which is to facilitate evaluation of IIR systems in a way which is as close as possible to realistic IR processes. The experimental setting consists of three components: (1) the involvement of potential users as test persons; (2) the application of dynamic and individual information needs; and (3) the use of multidimensional and dynamic relevance judgements. Hidden under the information need component is the essential central sub-component, the simulated work task situation, the tool that triggers the (simulated) dynamic information needs. This paper also reports on the empirical findings of the metaevaluation of the application of this sub-component, the purpose of which is to discover whether the application of simulated work task situations to future evaluation of IIR systems can be recommended. Investigations are carried out to determine whether any search behavioural differences exist between test persons' treatment of their own real information needs versus simulated information needs. The hypothesis is that if no difference exists one can correctly substitute real information needs with simulated information needs through the application of simulated work task situations. The empirical results of the meta-evaluation provide positive evidence for the application of simulated work task situations to the evaluation of IIR systems. The results also indicate that tailoring work task situations to the group of test persons is important in motivating them. Furthermore, the results of the evaluation show that different versions of semantic openness of the simulated situations make no difference to the test persons' search treatment.
The present two-part article introduces matrix comparison as a formal means for evaluation purposes in informetric studies such as cocitation analysis. In the first part, the motivation behind introducing matrix comparison to informetric studies, as well as two important issues influencing such comparisons, matrix generation, and the composition of proximity measures, are introduced and discussed. In this second part, the authors introduce and thoroughly demonstrate two related matrix comparison techniques the Mantel test and Procrustes analysis, respectively. These techniques can compare and evaluate the degree of monotonicity between different proximity measures or their ordination results. In common with these techniques is the application of permutation procedures to test hypotheses about matrix resemblances. The choice of technique is related to the validation at hand. In the case of the Mantel test, the degree of resemblance between two measures forecast their potentially different affect upon ordination and clustering results. In principle, two proximity measures with a very strong resemblance most likely produce identical results, thus, choice of measure between the two becomes less important. Alternatively, or as a supplement, Procrustes analysis compares the actual ordination results without investigating the underlying proximity measures, by matching two configurations of the same objects in a multidimensional space. An advantage of the Procrustes analysis though, is the graphical solution provided by the superimposition plot and the resulting decomposition of variance components. Accordingly, the Procrustes analysis provides not only a measure of general fit between configurations, but also values for individual objects enabling more elaborate validations. As such, the Mantel test and Procrustes analysis can be used as statistical validation tools in informetric studies and thus help choosing suitable proximity measures.
The present two-part article introduces matrix comparison as a formal means of evaluation in informetric studies such as cocitation analysis. In this first part, the motivation behind introducing matrix comparison to informetric studies, as well as two important issues influencing such comparisons, are introduced and discussed. The motivation is spurred by the recent debate on choice of proximity measures and their potential influence upon clustering and ordination results. The two important issues discussed here are matrix generation and the composition of proximity measures. The approach to matrix generation is demonstrated for the same data set, i.e., how data is represented and transformed in a matrix, evidently determines the behavior of proximity measures. Two different matrix generation approaches, in all probability, will lead to different proximity rankings of objects, which further lead to different ordination and clustering results for the same set of objects. Further, a resemblance in the composition of formulas indicates whether two proximity measures may produce similar ordination and clustering results. However, as shown in the case of the angular correlation and cosine measures, a small deviation in otherwise similar formulas can lead to different rankings depending on the contour of the data matrix transformed. Eventually, the behavior of proximity measures, that is whether they produce similar rankings of objects, is more or less dataspecific. Consequently, the authors recommend the use of empirical matrix comparison techniques for individual studies to investigate the degree of resemblance between proximity measures or their ordination results. In part two of the article, the authors introduce and demonstrate two related statistical matrix comparison techniques the Mantel test and Procrustes analysis, respectively. These techniques can compare and evaluate the degree of monotonicity between different proximity measures or their ordination results. As such, the Mantel test and Procrustes analysis can be used as statistical validation tools in informetric studies and thus help choosing suitable proximity measures.
The paper introduces bibliometrics to the research area of knowledge organization -more precisely in relation to construction and maintenance of thesauri. As such, the paper reviews related work that has been of inspiration for the assembly of a semi-automatic, bibliometric-based, approach for construction and maintenance. Similarly, the paper discusses the methodical considerations behind the approach. Eventually, the semi-automatic approach is used to verify the applicability of bibliometric methods as a supplement to construction and maintenance of thesauri. In the context of knowledge organization, the paper outlines two fundamental approaches to knowledge organization, that is, the manual intellectual approach and the automatic algorithmic approach. Bibliometric methods belong to the automatic algorithmic approach, though bibliometrics do have special characteristics that are substantially different from other methods within this approach.
This paper introduces the concepts of the relative relevance (RR) measure and a new performance indicator of the positional strength of the retrieved and ranked documents. The former is seen as a measure of associative performance computed by the application of the Jaccard formula. The latter is named the Ranked Half-Life (RHL) indicator and denotes the degree to which relevant documents are located on the top of a ranked retrieval result.The measures are proposed to be applied in addition to the traditional performance parameters such as precision and/or recall in connection with evaluation of interactive IR systems. The RR measure describes the degree of agreement between the types of relevance applied in evaluation of information retrieval (IR) systems in a non-binary assessment context. It is shown that the measure has potential to bridge the gap between subjective and objective relevance, as it makes it possible to understand and interpret the relation between these two main classes of relevance used in interactive IR experiments.The relevance concepts are defined, and the application of the measures is demonstrated by interrelating three types of relevance assessments: algorithmic; intellectual topical@ and; situational assessments.Further, the paper shows that for a given set of queries at given precision levels the RHL indicator adds to the understanding of comparisons of IR performance.
We report a naturalistic interactive information retrieval (IIR) study of 18 ordinary users in the age of 20-25 who carry out everyday-life information seeking (ELIS) on the Internet with respect to the three types of information needs identified by Ingwersen (1986): the verificative information need (VIN), the conscious topical information need (CIN), and the muddled topical information need (MIN). The searches took place in the private homes of the users in order to ensure as realistic searching as possible. Ingwersen (1996) associates a given search behaviour to each of the three types of information needs, which are analytically deduced, but not yet empirically tested. Thus the objective of the study is to investigate whether empirical data does, or does not, conform to the predictions derived from the three types of information needs. The main conclusion is that the analytically deduced information search behaviour characteristics by Ingwersen are positively corroborated for this group of test participants who search the Internet as part of ELIS.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.