Information retrieval techniques applied to the development of a thesaurus
Keywords:
Thesaurus development, Clustering, Vector space model, Generalized vector space model, Latent semantic indexing modelAbstract
The aim of the article was to propose the application of a set of techniques used in Information Retrieval for the development of a Thesaurus. The proposed ideas have been applied in the selection of the terminology; categorization of terms by creating clusters; and establishment of semantic relationships between terms through semantic similarity, which resulted in a Foreign Trade Thesaurus of 7,790 terms. From these results, we concluded that the techniques used significantly simplified the tasks of obtaining the terminology, and they can improve the quality of the final thesaurus. In addition, the techniques enabled the analysis of the conditions of the collection for which the thesaurus is used and provide extra information that would be hard to obtain manually.
Downloads
References
Aitchison, J.; Gilchrist, A.; Bawden, D. Thesaurus construction and use: A practical manual. 4th ed. London: Aslib, 2007.
Ansi/Niso Z39.19. Guidelines for the construction, format, and management of monolingual controlled vocabularies. Bethesda, Maryland: NISSO Press, 2005. Available from: http://www.niso.org/apps/group_public/download.php/6487/. Cited: Jan. 12, 2013.
Centro de Información y Documentación Cientifica. Tesauro Isoc de Economía. Madrid: IEDCYT, 1995. Disponible en: <http://thes.cindoc.csic.es/alfa_esp.php?thes=ECON&letra=A>. Acceso: 7 enero 2013.
Centro de Información y Documentación Cientifica. Tesauro Spines. Madrid: ICYT, 1988. Disponible en: <http://thes.cindoc.csic.es/index_SPIN_esp.php>. Acceso en: 7 enero 2013.
Crouch, C.J.; Yang, B. Experiments in automatic statistical thesaurus construction. In: International ACM/SIGIR Conference on Research and Development in Information Retrieval, 5., 1992, Copenhagen. Proceedings... Copenhagen: 1992. p.77-88.
Curran, J.R. Automatic thesaurus extraction. 2001. PhD (Thesis) - Edinburgh University, School of Informatics, 2001.
Chen, H. et al. Automatic thesaurus generation for an electronic community system. Journal of the American Society for Information Science, v.46, n.3, p.175-193, 1995.
Ferreyra, D. TemaTres: aplicación para la gestión de lenguajes documentales (versión 1.033) [Software]. R020.com.ar. 2009. Disponible en: <http://sourceforge.net/projects/tematres/>. Acceso en: 7 enero 2013.
Frakes, W.B.; Baeza-Yates, R. Information retrieval: Data structures and algorithms. London: Prentice Hall, 1992.
Gil Urdiciain, B. Manual de lenguajes documentales. Gijón: Trea, 2004.
Grefenstette, G. Explorations in automatic thesaurus discovery. Boston: Kluwer Academic Publishers, 1994.
Hernández Muñoz, L. Diccionario de comercio internacional. Madrid: Instituto Español de Comercio Exterior, 2002.
International Standard Organization. Documentation 2788-1986: Guidelines for the establishment and development of monolingual thesauri. Genève: ISO, 1986.
International Standard Organization. ISO 3166-1:2006: codes for the representation of names of countries and their subdivisions - Part 1: Country codes. Genéve: ISO, 2006.
Manning, C.; Schütze, H. Foundations of statistical language processing. 2nd ed. Cambridge: The Mit Press, 2002.
Moreiro Gonzalez, J.A. et al. Generación automática de tesauros: propuesta de un método lingüístico-estadístico. Ciencias de la Información, v.30, n.4, p.139-147, 1999.
Organización Mundial del Comercio. Tesauro de términos de comercio internacional. Ginebra: Centro de Comercio Internacional, 2004.
Parlamento Europeo. Tesauro Eurovoc. Comisión de las Comunidades Europeas. Oficina de Publicaciones Oficiales. Luxembourg: Parlamento Europeo, 1987.
Pérez Agüera, J.R. Generación automática de tesauros documentales: trabajo para la obtención de Diploma de Estudios Avanzados (DEA) en Informática. Madrid: Universidad Complutense, 2005.
Rijdbergen, K. Information retrieval. 2nd ed. London: Butterworths, 1979.
Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, v.24, n.5, p.513-523, 1988.
Taric S.A. Aranceles de la Unión Europea: Arancel netTaric. Grupo TARIC, 2013. Disponible en: <http://www.taric.es/services/nettaric/nettaric.asp>. Acceso el: 20 enero 2013.
Yang, Y.; Pedersen, O.J. A comparative study on feature selection in text categorization. In: Internacional Conference on Machine Learnig, 14., 1997, San Francisco. Proceedings…San Francisco: Morgan Kaufmann Publishers, 1997. p.412-420.
Yang, Y. Expert network: Effective and efficient learning from human decisions in text categorisation and retrieval. In: ACM International Conference on Research and Development in Information Retrieval, 17., 1994, Dublin, Ireland. Proceedings... New York: Springer-Verlag, 1994. p.13-22.
Witten, I.H. et al. KEA: Practical automatic keyphrase extraction. Hamilton, New Zealand: University of Waikato, 1999.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Transinformação
This work is licensed under a Creative Commons Attribution 4.0 International License.