Institutul de Matematică şi Informatică "Vladimir Andrunachievici"

RO EN

IMI/Publicaţii/CSJM/Ediţii/CSJM v.33, n.2 (98), 2025/

Feature-Level Decomposition of Text Complexity: Cross-Domain Empirical Evidence

Authors: Parahonco Alexandru, Liudmila Parahonco
Keywords: Text Complexity, Large Language Models, Feature Decomposition, Spearman Correlation, Prompting Strategy, Domain Dependency.

Abstract

This study presents a feature-level analysis of text complexity using large language models (LLMs) in a two-phase design. Phase I operationalized six core features -- lexical diversity, density, syntactic complexity, coherence, named entities, and readability -- achieving Spearman correlations of 0.55–0.60 across domains. Phase II employed indirect prompting to surface additional qualitative dimensions (e.g., inferential load, rhetorical structure), yielding a mean correlation of 0.42 and revealing that the six features account for ~40\% of complexity variance. Domain dependencies were limited to named entities and lexical diversity. We propose a hybrid model combining normalization, root-based synergies, and newly quantified metrics with domain-tuned formulae for improved prediction.% \footnote{This work was supported by the program HORIZON-MSCA-2021-SE-01, as part of the "Elevating Higher Education public policies: an empowering SPRIngboard" (HESPRI) project.

Alexandr Parahonco
ORCID: https://orcid.org/0009-0007-3486-5597
Vladimir Andrunachievici Institute of Mathematics and Computer Science, Moldova
State University, Moldova;
Faculty of Computer Science, Alexandru Ioan Cuza University of Ia¸si, 700506,
Romania
E-mail: ,

Liudmila Parahonco
ORCID: https://orcid.org/0000-0002-7010-3107
Faculty of Letters, Alecu Russo State University of B˘alt, i, MD-3100, Moldova
E-mail:

DOI

https://doi.org/10.56415/csjm.v33.13

Fulltext

– 1.55 Mb

Enhanced Green Accelerated Hoeffding Trees for Improved Data Stream Classification
Enhancing Gait Recognition with Attention-Based Spatial-Temporal Deep Learning: The GaitDeep Framework
A poor man's realization of Demoucron-Malgrange-Pertuiset algorithm
Automation of PostOCR error correction in the digitization of historical texts
Feature-Level Decomposition of Text Complexity: Cross-Domain Empirical Evidence

Institutul de Matematică şi Informatică "Vladimir Andrunachievici"

Feature-Level Decomposition of Text Complexity: Cross-Domain Empirical Evidence

Abstract

DOI

Fulltext

Contents