Técnicas de recuperación de información aplicadas a la construcción de tesauros

Blanca  GIL URDICIAIN; Rodrigo  Sánchez JIMÉNEZ

Authors

Blanca GIL URDICIAIN
Rodrigo Sánchez JIMÉNEZ

Keywords:

Thesaurus development, Clustering, Vector space model, Generalized vector space model, Latent semantic indexing model

Abstract

The aim of the article was to propose the application of a set of techniques used in Information Retrieval for the development of a Thesaurus. The proposed ideas have been applied in the selection of the terminology; categorization of terms by creating clusters; and establishment of semantic relationships between terms through semantic similarity, which resulted in a Foreign Trade Thesaurus of 7,790 terms. From these results, we concluded that the techniques used significantly simplified the tasks of obtaining the terminology, and they can improve the quality of the final thesaurus. In addition, the techniques enabled the analysis of the conditions of the collection for which the thesaurus is used and provide extra information that would be hard to obtain manually.

Downloads

Download data is not yet available.

References

Aitchison, J.; Gilchrist, A.; Bawden, D. Thesaurus construction and use: A practical manual. 4th ed. London: Aslib, 2007.

Ansi/Niso Z39.19. Guidelines for the construction, format, and management of monolingual controlled vocabularies. Bethesda, Maryland: NISSO Press, 2005. Available from: http://www.niso.org/apps/group_public/download.php/6487/. Cited: Jan. 12, 2013.

Centro de Información y Documentación Cientifica. Tesauro Isoc de Economía. Madrid: IEDCYT, 1995. Disponible en: <http://thes.cindoc.csic.es/alfa_esp.php?thes=ECON&letra=A>. Acceso: 7 enero 2013.

Centro de Información y Documentación Cientifica. Tesauro Spines. Madrid: ICYT, 1988. Disponible en: <http://thes.cindoc.csic.es/index_SPIN_esp.php>. Acceso en: 7 enero 2013.

Crouch, C.J.; Yang, B. Experiments in automatic statistical thesaurus construction. In: International ACM/SIGIR Conference on Research and Development in Information Retrieval, 5., 1992, Copenhagen. Proceedings... Copenhagen: 1992. p.77-88.

Curran, J.R. Automatic thesaurus extraction. 2001. PhD (Thesis) - Edinburgh University, School of Informatics, 2001.

Chen, H. et al. Automatic thesaurus generation for an electronic community system. Journal of the American Society for Information Science, v.46, n.3, p.175-193, 1995.

Ferreyra, D. TemaTres: aplicación para la gestión de lenguajes documentales (versión 1.033) [Software]. R020.com.ar. 2009. Disponible en: <http://sourceforge.net/projects/tematres/>. Acceso en: 7 enero 2013.

Frakes, W.B.; Baeza-Yates, R. Information retrieval: Data structures and algorithms. London: Prentice Hall, 1992.

Gil Urdiciain, B. Manual de lenguajes documentales. Gijón: Trea, 2004.

Grefenstette, G. Explorations in automatic thesaurus discovery. Boston: Kluwer Academic Publishers, 1994.

Hernández Muñoz, L. Diccionario de comercio internacional. Madrid: Instituto Español de Comercio Exterior, 2002.

International Standard Organization. Documentation 2788-1986: Guidelines for the establishment and development of monolingual thesauri. Genève: ISO, 1986.

International Standard Organization. ISO 3166-1:2006: codes for the representation of names of countries and their subdivisions - Part 1: Country codes. Genéve: ISO, 2006.

Manning, C.; Schütze, H. Foundations of statistical language processing. 2nd ed. Cambridge: The Mit Press, 2002.

Moreiro Gonzalez, J.A. et al. Generación automática de tesauros: propuesta de un método lingüístico-estadístico. Ciencias de la Información, v.30, n.4, p.139-147, 1999.

Organización Mundial del Comercio. Tesauro de términos de comercio internacional. Ginebra: Centro de Comercio Internacional, 2004.

Parlamento Europeo. Tesauro Eurovoc. Comisión de las Comunidades Europeas. Oficina de Publicaciones Oficiales. Luxembourg: Parlamento Europeo, 1987.

Pérez Agüera, J.R. Generación automática de tesauros documentales: trabajo para la obtención de Diploma de Estudios Avanzados (DEA) en Informática. Madrid: Universidad Complutense, 2005.

Rijdbergen, K. Information retrieval. 2nd ed. London: Butterworths, 1979.

Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, v.24, n.5, p.513-523, 1988.

Taric S.A. Aranceles de la Unión Europea: Arancel netTaric. Grupo TARIC, 2013. Disponible en: <http://www.taric.es/services/nettaric/nettaric.asp>. Acceso el: 20 enero 2013.

Yang, Y.; Pedersen, O.J. A comparative study on feature selection in text categorization. In: Internacional Conference on Machine Learnig, 14., 1997, San Francisco. Proceedings…San Francisco: Morgan Kaufmann Publishers, 1997. p.412-420.

Yang, Y. Expert network: Effective and efficient learning from human decisions in text categorisation and retrieval. In: ACM International Conference on Research and Development in Information Retrieval, 17., 1994, Dublin, Ireland. Proceedings... New York: Springer-Verlag, 1994. p.13-22.

Witten, I.H. et al. KEA: Practical automatic keyphrase extraction. Hamilton, New Zealand: University of Waikato, 1999.

Information retrieval techniques applied to the development of a thesaurus

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

ISSN

Qualis

Indexes

Plagiarism check

Access policy

Digital preservation

Social medias

Information