RO  EN
IMI/Publicaţii/CSJM/Ediţii/CSJM v.33, n.2 (98), 2025/

Automation of PostOCR error correction in the digitization of historical texts

Authors: Bumbu Tudor, Burţeva Liudmila, Cojocaru Svetlana, Colesnicov Alexandru, Malahov Ludmila
Keywords: historical fonts, OCR, PostOCR.

Abstract

Processing texts from distant historical periods, especially those handwritten in languages with low computational resources, presents significant challenges. Even if modern methods make it possible to achieve, after laborious machine learning procedures, a fairly good rate of correct character recognition, the problem of the correctness of the resulting editable text remains a topical one. This paper presents an approach that contributes to the automation of the PostOCR proofreading process based on the presentation of digitized text using historical fonts, similar to those in the original document.

Tudor Bumbu 1,2, Lyudmila Burtseva1,3,
Svetlana Cojocaru1,4,
Alexandru Colesnicov1,5, Ludmila Malahov1,6
1 Moldova State University, "V. Andrunachievici" Institute of Mathematics and
Computer Science, Chisinau, Republic of Moldova

2ORCID: https://orcid.org/0000-0001-5311-4464
E-mail:

3ORCID: https://orcid.org/0000-0002-9064-2538
E-mail:

4ORCID: https://orcid.org/0009-0003-1025-5306
E-mail:

5ORCID: https://orcid.org/0000-0002-4383-3753
E-mail:

6ORCID: https://orcid.org/0000-0001-9846-0299
E-mail:

DOI

https://doi.org/10.56415/csjm.v33.12

Fulltext

Adobe PDF document3.50 Mb