Extraindo o significado de uma palavra: : uma abordagem de Inteligência Artificial

Aerty Pinto dos Santos; Eduardo Almeida Santos Oliveira; Juliana Pinheiro Campos Pirovani; Elias de Oliveira

Authors

Aerty Pinto dos Santos Universidade Federal do Espírito Santo, Departamento de Arquivologia, Programa de Pós-Graduação em Informática. https://orcid.org/0009-0009-0496-0272
Eduardo Almeida Santos Oliveira Universidade Federal do Espírito Santo, Departamento de Arquivologia, Programa de Pós-Graduação em Informática. https://orcid.org/0009-0002-6669-9750
Juliana Pinheiro Campos Pirovani Universidade Federal do Espírito Santo, Curso de Ciência da Computação, Departamento de Computação. https://orcid.org/0000-0002-3727-4158
Elias de Oliveira Universidade Federal do Espírito Santo, Departamento de Arquivologia, Programa de Pós-Graduação em Informática. https://orcid.org/0000-0003-2066-7980

Keywords:

Contextual meaning, Machine learning, Natural languane processing, Semantic classification, Text analysis

Abstract

This article presents a strategy to extract the meaning of words in different contexts, using classification algorithms such as kNN, WiSARD, and 1NN, combined with a robust language model. The main objective is to investigate how the term “archive” is used in journalistic articles and how this usage reflects the value placed on the work of archivists. To achieve this, texts published in the newspaper “A Tribuna” between 2003 and 2017 were analyzed. The adopted method involves the automatic classification of sentences containing the term “archive,” dividing them into eleven categories that represent different interpretations of the term. The research was conducted through a classification algorithm, trained to identify semantic patterns in the sentences. This is a textual data analysis extracted from a digital collection of a periodical, without the direct participation of human subjects. The results indicate that combining the language model with the neural network significantly improves classification performance, surpassing traditional methods in metrics such as precision and recall. Additionally, the analysis showed that the term “archive” is widely used in different contexts by journalists, revealing multiple meanings and highlighting the importance of archivists in the process of organizing and documenting records. The proposed approach shows potential for application in other domains, contributing to the automation of semantic inference and the classification of large volumes of textual data

Downloads

Download data is not yet available.

Author Biography

Aerty Pinto dos Santos, Universidade Federal do Espírito Santo, Departamento de Arquivologia, Programa de Pós-Graduação em Informática.

References

Azevedo Netto, C.X. A abordagem do conceito como uma estrutura semiótica. Transinformação, v. 20, n. 1, 2008.

Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval. 2nd. ed. New York: Addison-Wesley, 2011.

Brown, T. et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, v. 33, p. 1877-1901, 2020. Doi: https://doi.org/10.48550/arXiv.2005.14165.

Buttcher, S.; Clarke, C.L.A.; Cormack, G.V. Information retrieval: Implementing and evaluating search engines. Cambridge: MIT Press, 2016.

Chang, K.-W. et al. SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks. IEEE/ACM Transactions on Audio, Speech, and Language Processing, v. 32, p. 3730-3744, 2024. Doi: https://doi.org/10.1109/TASLP.2024.3436618.

De Gregorio, M. et al. Classification of preclinical markers in Alzheimer’s disease via WiSARD classifier. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, 30., 2022, Bruges. Proceedings […]. Bruges: ENNS, 2022. p. 43-48. Doi: https://doi.org/10.14428/esann/2022.ES2022-63.

Geler, Z. et al. Comparison of different weighting schemes for the kNN classifier on time-series data. Knowledge and Information Systems, v. 48, p. 331-378, 2016. Doi: https://doi.org/10.1007/s10115-015-0881-0.

Grigoleto, M.C.; Aldabalde, T.V.; Oliveira, E. Discutindo a questão da polissemia do termo arquivo na imprensa: um estudo a partir da Teoria do Continuum. In: Encontro Nacional de Pesquisa em Ciência da Informação (ENANCIB), 17., 2017, Rio de Janeiro. Anais […]. Rio de Janeiro: ANCIB, 2017.

Gul, A. et al. Ensemble of a subset of kNN classifiers. Advances in Data Analysis and Classification, v. 12, n. 4, p. 827-840, 2018. Doi: https://doi.org/10.1007/s11634-015-0227-5.

Henderi, H. Comparison of min-max normalization and z-score normalization in the k-nearest neighbor (knn) algorithm to test the accuracy of types of breast cancer. IJIIS: International Journal of Informatics and Information Systems, v. 4, p. 13-20, 2021. Doi: https://doi.org/10.47738/ijiis.v4i1.73.

Kappaun, A. et al. Evaluating Binary Encoding Techniques for WiSARD. In: Brazilian Conference on Intelligent Systems (BRACIS), 5., Recife, 2016. Proceedings […]. Recife: SBC, 2016. p. 103-108. Doi: https://doi.org/10.1109/BRACIS.2016.029.

Kublik, S.; Saboo, S. GPT-3: The ultimate guide to building NLP products with OpenAI API. [S. l.]: Packt Publishing, 2023.

Li, Q. et al. A Survey on text classification: from traditional to deep learning. ACM Transactions on Intelligent Systems and Technology (TIST), v. 13, n. 2, p. 1-41, 2022. Doi: https://doi.org/10.1145/1122445.1122456

Morettin, P.A.; Bussab, W.O. Estatística básica. 6. ed. São Paulo: Saraiva, 2010.

Muhammad, L.J.; Algehyne, E.A.; Usman, S.S. Predictive supervised machine learning models for diabetes mellitus. SN Computer Science, v. 1, n. 5, 2020. Doi: https://doi.org/10.1007/s42979-020-00250-8.

Nogueira, C. J. et al. Amplitude de movimento de militares submetidos a 12 semanas de alongamento com diferentes intensidades. Educación Física y Ciencia, v. 22, p. 3, e135, 2020. Doi: https://doi.org/10.24215/23142561e135.

Nurrahma, R.; Yusuf, R. Comparando diferentes precisões de aprendizado de máquina supervisionado na análise de dados da COVID-19 usando o teste ANOVA. In: International Conference on Interactive Digital Media (ICIDM), 6., 2020, Bandung. Proceedings […]. Bandung: UTM, 2020. p. 1-6. Doi: https://doi.org/10.1109/ICIDM51048.2020.9339676.

Oliveira, E.; Branquinho Filho, D. Automatic classification of journalistic documents on the Internet. Transinformação, v. 29, n. 3, 2017. Doi: https://doi.org/10.1590/2318-08892017000300003.

Pannakkong, W. et al. Hyperparameter tuning of machine learning algorithms using response surface methodology: a case study of ANN, SVM, and DBN. Mathematical Problems in Engineering, v. 2022, p. 1-17,2022. . Doi: https://doi.org/10.1155/2022/8513719.

Reiss, M.V. Testing the reliability of ChatGPT for text annotation and classification: a cautionary remark. arXiv, 2023. Doi: https://doi.org/10.48550/arXiv.2304.11085.

Riyanto, S. et al. Comparative analysis using various performance metrics in imbalanced data for multi-class text classification. International Journal of Advanced Computer Science and Applications, v. 14, n. 6, 2023. Doi: http://dx.doi.org/10.14569/IJACSA.2023.01406116.

Yacouby, R.; Axman, D. Probabilistic extension of precision, recall, and F1 score for more thorough evaluation of classification models. In: Workshop on Evaluation and Comparison of NLP Systems, 1., 2020. Proceedings [...]. [S. l.]: Association for Computational Linguistics, 2020. p. 79-91. Doi: https://doi.org/10.18653/v1/2020.eval4nlp-1.9.

Extracting the meaning of a word:

an artificial intelligence approach

Authors

Keywords:

Abstract

Downloads

Author Biography

Aerty Pinto dos Santos, Universidade Federal do Espírito Santo, Departamento de Arquivologia, Programa de Pós-Graduação em Informática.

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

ISSN

Qualis

Indexes

Plagiarism check

Access policy

Digital preservation

Social medias

Information