Authors: Parahonco Alexandru, Liudmila Parahonco
Keywords: Text Complexity, Large Language Models, Feature Decomposition, Spearman Correlation, Prompting Strategy, Domain Dependency.
Abstract
This study presents a feature-level analysis of text complexity using large language models (LLMs) in a two-phase design. Phase I operationalized six core features -- lexical diversity, density, syntactic complexity, coherence, named entities, and readability -- achieving Spearman correlations of 0.55–0.60 across domains. Phase II employed indirect prompting to surface additional qualitative dimensions (e.g., inferential load, rhetorical structure), yielding a mean correlation of 0.42 and revealing that the six features account for ~40\% of complexity variance. Domain dependencies were limited to named entities and lexical diversity. We propose a hybrid model combining normalization, root-based synergies, and newly quantified metrics with domain-tuned formulae for improved prediction.% \footnote{This work was supported by the program HORIZON-MSCA-2021-SE-01, as part of the "Elevating Higher Education public policies: an empowering SPRIngboard" (HESPRI) project.
Alexandr Parahonco
ORCID: https://orcid.org/0009-0007-3486-5597
Vladimir Andrunachievici Institute of Mathematics and Computer Science, Moldova
State University, Moldova;
Faculty of Computer Science, Alexandru Ioan Cuza University of Ia¸si, 700506,
Romania
E-mail: ,
Liudmila Parahonco
ORCID: https://orcid.org/0000-0002-7010-3107
Faculty of Letters, Alecu Russo State University of B˘alt, i, MD-3100, Moldova
E-mail:
DOI
https://doi.org/10.56415/csjm.v33.13
Fulltext

–
1.55 Mb