Performance evaluation in content-based image retrieval: overview and proposals
Introduction
Early reports of the performance of content-based image retrieval (CBIR) systems were often restricted simply to printing the results of one or more example queries (e.g. Flickner et al., 1995). This is easily tailored to give a positive impression, since developers can select queries which give good results. Hence it is neither an objective performance measure, nor a means of comparing different systems. Researchers have subsequently developed a variety of CBIR performance measures, which are discussed in Section 4. The paper of Narasimhalu et al. (1997) gives a good grouping of multimedia retrieval systems for evaluation and provides some guidelines for the construction of evaluation measures. MIR (1996) gives a further survey on performance measures. However, few standard methods exist which are used by a large number of researchers. Many of the measures used in CBIR (such as precision, recall and their graphical representation) have long been used in information retrieval (IR). Several other standard IR tools have recently been imported into CBIR, e.g. relevance feedback. In order to avoid reinventing already existing techniques, it seems logical to make a systematic review of evaluation methods used in IR and their suitability for CBIR.
In the 1950s, IR researchers were already discussing performance evaluation, and the first concrete steps were taken with the development of the SMART system in 1961 (Salton, 1971b). Other important steps towards common performance measures were made with the Cranfield test (Cleverdon et al., 1966). Finally, the TREC series started in 1992, combining many efforts to provide common performance tests. The TREC project (see TRE, 1999; Vorhees and Harmann, 1998) provides a focus for these activities and is the worldwide standard in IR. Nevertheless, much research remains to be done on the evaluation of interactive systems and the inclusion of the user into the query process. Such novelties are included in TREC regularly, e.g. the interactive track in 1994. Salton (1992) gives an overview of IR system evaluation.
Section snippets
Textual information retrieval
Although performance evaluation in IR started in the 1950s, here we focus on newer results and especially on TREC and its achievements in the IR community.
Basic problems in CBIR performance evaluation
The current status of performance evaluation in CBIR is far from that in IR. There are many different groups which work with several sets of specialized images. There is neither a common image collection, nor a common way to get relevance judgments, nor a common evaluation scheme.
User comparison
User comparison is an interactive method. The users judge the success of a query directly after the query. It is hard to get a large number of such user comparisons as they are time-consuming.
Before-after comparison. This is the easiest test method. Users are given two or more different results and are asked to choose the one which is preferred or found to be most relevant to the query. This method needs a base system or, at least, another system for comparison.
Single-valued measures
Rank of the best match. Berman
Proposals
In the preceding sections a large number of different evaluation techniques has been described. It is apparent that many of them are equivalent or contain the same information. Clearly it would be beneficial to the CBIR community if only standardized names and definitions were used for performance measures. Since scaling or the use of partial graphs impedes interpretation, these techniques should only be used for emphasis, in conjunction with a complete graph.
We propose to use only image
Conclusions
This article gives an overview of existing performance evaluation measures in CBIR. The need for standardized evaluation measures is clear, since several measures are slight variations of the same definition. This makes it very hard to compare the performance of systems objectively. To overcome this problem a set of standard performance measures and a standard image database is needed. We have proposed such a set of measures, similar to those used in TREC. A frequently updated shared image
Acknowledgements
This work is supported by the Swiss National Foundation for Scientific Research (grant no. 2000-052426.97).
References (41)
- Aksoy, S., Haralick, R.M., 1999. Graph theoretic clustering for image grouping and retrieval. In: Proc. 1999 IEEE Conf....
- ANN, 1999. Annotated groundtruth database, Department of Computer Science and Engineering, University of Washington,...
- Belongie, S., Carson, C., Greenspan, H., Malik, J., 1998. Color- and texture-based image segmentation using EM and its...
- Berman, A.P., Shapiro, L.G., 1999. Efficient content-based retrieval: Experimental results. In: IEEE Workshop on...
All users of information retrieval systems are not created equal: an exploration into individual differences
Information Processing and Management
(1989)- Cleverdon, C.W., Mills, L., Keen, M., 1966. Factors determining the performance of indexing systems, Technical report,...
- Comaniciu, D., Meer, P., Xu, K., Tyler, D., 1999. Retrieval performance improvement through low rank corrections. In:...
- COR, 1999. Corel clipart and photos,...
- Cox, I.J., Miller, M.L., Omohundro, S.M., Yianilos, P.N., 1996. Target testing and the PicHunter Bayesian multimedia...
- Dy, J.G., Brodley, C.E., Kak, A., Shyu, C.-R., Broderick, L.S., 1999. The customized-queries approach to CBIR using EM....
Query by image and video content: The QBIC system
IEEE Computer
Cited by (453)
MSPPIR: Multi-Source Privacy-Preserving Image Retrieval in cloud computing
2022, Future Generation Computer SystemsA faster secure content-based image retrieval using clustering for cloud[Formula presented]
2022, Expert Systems with ApplicationsColor texture image retrieval based on Copula multivariate modeling in the Shearlet domain
2021, Engineering Applications of Artificial IntelligencePrivacy-preserving image retrieval based on additive secret sharing
2024, International Journal of Autonomous and Adaptive Communications SystemsContent-based image retrieval with fuzzy clustering for feature vector normalization
2024, Multimedia Tools and Applications