RO  EN
IMCS/Publications/CSJM/Issues/CSJM v.20, n.3 (60), 2012/

Toward the Soundness of Sense Structure Definitions in Thesaurus-Dictionaries. Parsing Problems and Solutions*

Authors: N. Curteanu, A. Moruz
Keywords: dictionary entry parsing; parsing method of SCD configurations; recursive lexicographic segments; recursive calls of sense markers; Enumeration Closing Condition; soundness of sense structure definitions.

Abstract

In this paper we point out some difficult problems of thesaurus-dictionary entry parsing, relying on the parsing technology of SCD (Segmentation-Cohesion-Dependency) configurations, successfully applied on six largest thesauri - Romanian (2), French, German (2), and Russian. Challenging Problems: (a) Intricate and/or recursive structures of the lexicographic segments met in the entries of certain thesauri; (b) Cyclicity (recursive) calls of some sense marker classes on marker sequences; (c) Establishing the hypergraph-driven dependencies between all the atomic and non-atomic sense definitions. Classical approach to solve these parsing problems is hard mainly because of depth-first search of sense definitions and markers, the substantial complexity of entries, and the sense tree dynamic construction embodied within these parsers. SCD-based Parsing Solutions: (a) The SCD parsing method is a procedural tool, completely formal grammar-free, handling the recursive structure of the lexicographic segments by procedural non-recursive calls performed on the SCD parsing configurations of the entry structure. (b) For dealing with cyclicity (recursive) calls between secondary sense markers and the sense enumeration markers, we proposed the Enumeration Closing Condition, sometimes coupled with New_Paragraphs typographic markers transformed into numeral sense enumeration. (c) These problems, their lexicographic modeling and parsing solutions are addressed to both dictionary parser programmers to experience the SCD-based parsing method, as well as to lexicographers and thesauri designers for tailoring balanced lexical-semantics granularities and sounder sense tree definitions of the dictionary entries.

Neculai Curteanu
Institute of Computer Science,
Romanian Academy, Iaşi Branch
Str. Gh. Asachi, Nr. 3,
700483 Iaşi, România
E-mail: ,

Alex Moruz
Institute of Computer Science,
Romanian Academy, Iaşi Branch,
Faculty of Computer Science,
"Al. I. Cuza" University of Iaşi,
E-mail: ,



Fulltext

Adobe PDF document0.81 Mb