Un método eficaz de indexación para la recuperación de imágenes en archivos en formato pdf

  1. Mata Vázquez, Jacinto
  2. Crespo Azcárate, Mariano
  3. Maña López, Manuel Jesús
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2010

Issue: 45

Pages: 21-30

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

One of the areas which is presently awakening more interest among researchers and users of Information Retrieval systems is the retrieval of documents containing images which are relevant to a need for information. In this case, the main objective is not the retrieval of the documents relevant to the user’s need for information, but the achievement of the images relevant to that need for information. At present, document collections can be found in a variety of formats (html, xml, pdf, etc). In this paper we present an efficient method to index a collection of documents in pdf format to improve the retrieval of images contained in documents. The experiments we carried out prove that the method presented here achieves better results than indexing the full text.

Bibliographic References

  • Christiansen, A., D. Lee y Y. Chang. 2007. Finding relevant PDF medical journal articles by the content of their figures. En Proc. SPIE Vol. 6516
  • Cutting, D., M. Busch, D. Cohen, O. Gospodnetic, E. Hatcher, C. Hostetter, G. Ingersoll, M. McCandless, B. Messer, D. Naber y Y. Seeley. 2008. Apache Lucene. http://apache.lucene.org.
  • Divoli, A., Michael A. Wooldridge, Marti A. Hearst. 2010. Full Text and Figure Display Improves Bioscience Literature Search. PLoS ONE 5(4): e9619.
  • Hearst, M., A. Divoli, H. Guturu, A. Ksikes, P. Nakov, M.A. Wooldridge y J. Ye. 2007. BioText Search Engine: beyond abstract search. Bioinformatics 23(16): 2196-2197.
  • Kahn, C.H. Jr. y C. Thao. 2007. GoldMiner: A Radiology Image Search Engine. American Journal of Roentgenology 188:1475-1478
  • Van Rijsbergen, CJ. 1979. Information Retrieval. Second Edition. Ed. Butterworths. Londres.
  • Xu, S., J. McCusker y M. Krauthammer. 2008. Yale Image Finder (YIF): a new search engine for retrieving biomedical images. Bioinformatics 24(17): 1968-1970.
  • Yu, H. y M. Lee. 2006. Accessing bioscience images from abstract sentences. Bioinformatics 22(14): e547–56.
  • Yu, H., S. Agarwal, M. Johnston y A. Cohen. 2009 Are figure legends sufficient? Evaluating the contribution of associated text to biomedical figure comprehension. J Biomed Discov Collab 4: 1.