Institutul de Matematică şi Informatică "Vladimir Andrunachievici"

RO EN

IMI/Publicaţii/CSJM/Ediţii/CSJM v.14, n.1 (40), 2006/

From Word Alignment to Word Senses, via Multilingual Wordnets

Authors: Dan Tufis

Abstract

Most of the successful commercial applications in language processing (text and/or speech) dispense with any explicit concern on semantics, with the usual motivations stemming from the computational high costs required for dealing with semantics, in case of large volumes of data. With recent advances in corpus linguistics and statistical-based methods in NLP, revealing useful semantic features of linguistic data is becoming cheaper and cheaper and the accuracy of this process is steadily improving. Lately, there seems to be a growing acceptance of the idea that multilingual lexical ontologisms might be the key towards aligning different views on the semantic atomic units to be used in characterizing the general meaning of various and multilingual documents. Depending on the granularity at which semantic distinctions are necessary, the accuracy of the basic semantic processing (such as word sense disambiguation) can be very high with relatively low complexity computing. The paper substantiates this statement by presenting a statistical/based system for word alignment and word sense disambiguation in parallel corpora. We describe a word alignment platform which ensures text pre-processing (tokenization, POS-tagging, lemmatization, chunking, sentence and word alignment) as required by an accurate word sense disambiguation.

Dan Tufis
Institute for Artificial Intelligence,
13, "13 Septembrie", 050711, Bucharest 5, Romania
E-mail:

Fulltext

– 0.45 Mb

From Word Alignment to Word Senses, via Multilingual Wordnets
Linguistic Resources and Technologies for Romanian Language
Local and Global Parsing with Functional (F)X-bar Theory and SCD Linguistic Strategy (I.) Part I. FX-bar Schemes and Theory. Local and Global FX-bar Projections
The ascertainment of the inflexion models for Romanian
Intonational Structures in Romanian Yes-No Questions
Integrity and correctness checking of a lexical database

Institutul de Matematică şi Informatică "Vladimir Andrunachievici"

From Word Alignment to Word Senses, via Multilingual Wordnets

Abstract

Fulltext

Contents