Institutul de Matematică şi Informatică "Vladimir Andrunachievici"

RO EN

IMI/Publicaţii/CSJM/Ediţii/CSJM v.33, n.2 (98), 2025/

Automation of PostOCR error correction in the digitization of historical texts

Authors: Bumbu Tudor, Burţeva Liudmila, Cojocaru Svetlana, Colesnicov Alexandru, Malahov Ludmila
Keywords: historical fonts, OCR, PostOCR.

Abstract

Processing texts from distant historical periods, especially those handwritten in languages with low computational resources, presents significant challenges. Even if modern methods make it possible to achieve, after laborious machine learning procedures, a fairly good rate of correct character recognition, the problem of the correctness of the resulting editable text remains a topical one. This paper presents an approach that contributes to the automation of the PostOCR proofreading process based on the presentation of digitized text using historical fonts, similar to those in the original document.

Tudor Bumbu ^1,2, Lyudmila Burtseva1,³,
Svetlana Cojocaru1,⁴,
Alexandru Colesnicov^1,5, Ludmila Malahov^1,6
¹ Moldova State University, "V. Andrunachievici" Institute of Mathematics and
Computer Science, Chisinau, Republic of Moldova

²ORCID: https://orcid.org/0000-0001-5311-4464
E-mail:

³ORCID: https://orcid.org/0000-0002-9064-2538
E-mail:

⁴ORCID: https://orcid.org/0009-0003-1025-5306
E-mail:

⁵ORCID: https://orcid.org/0000-0002-4383-3753
E-mail:

⁶ORCID: https://orcid.org/0000-0001-9846-0299
E-mail:

DOI

https://doi.org/10.56415/csjm.v33.12

Fulltext

– 3.50 Mb

Enhanced Green Accelerated Hoeffding Trees for Improved Data Stream Classification
Enhancing Gait Recognition with Attention-Based Spatial-Temporal Deep Learning: The GaitDeep Framework
A poor man's realization of Demoucron-Malgrange-Pertuiset algorithm
Automation of PostOCR error correction in the digitization of historical texts
Feature-Level Decomposition of Text Complexity: Cross-Domain Empirical Evidence

Institutul de Matematică şi Informatică "Vladimir Andrunachievici"

Automation of PostOCR error correction in the digitization of historical texts

Abstract

DOI

Fulltext

Contents