RO  EN
IMCS/Projects/

Digitizarea și crearea corpusului FolkAI cu texte ale folclorului Basarabean din sec XIX-XX

Programmee:State Programs and Grants
Code:25.80012.0807.50SE
Execution period:2025 – 2026
Institutions:Moldova State University, Vladimir Andrunachievici Institute of Mathematics and Computer Science
Project Leader:Petic Mircea
Participants: Cojocaru Svetlana, Malahov Ludmila, Caftanatov Olesea
Type :National framework for research and innovation projects "Stimulating excellence in research"

Summary

The project aims to develop AI (Artificial Intelligence) tools and models for the recognition, digitization, and transliteration of Basarabian folk texts from the 19th and 20th centuries, which will contribute to preserving our cultural identity. At least 5,000 pages will be processed, thus creating the first Basarabian Folklore Corpus, consisting of over 100,000 tokens from a variety of literary works. Additionally, using AI technologies, illustrative images will be generated for fairy tales, proverbs, and sayings, facilitating the linguistic correlation between words and their meanings. The project includes a detailed analysis of the stylistic and structural features of folk texts, utilizing AI tools for the automatic recognition and classification of basic elements. LLM models will also be employed for advanced processing with applicability to various analyses. The FolkAI corpus will be diachronic-parallel, including texts from different historical periods, thus enabling an in-depth analysis of the evolution of the Basarabian folk language. This initiative makes a vital contribution to the preservation and promotion of cultural heritage, facilitating public access to these digital resources and supporting the objectives of the Digital Transformation Strategy of the Republic of Moldova (2023-2030) and the Digital Europe Program (2021-2027). Furthermore, the resulting materials will be useful in the educational field, contributing to the preservation and valorization of Basarabian cultural traditions.