Vladimir Andrunachievici Institute of Mathematics and Computer Science

RO EN

IMCS/Publications/CSJM/Issues/CSJM v.20, n.2 (59), 2012/

Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation

Authors: Dan Tufiş
Keywords: alignment, comparable corpora, document crawling, machine learning, multilingual corpora, parallel corpora, statistical machine translation.

Abstract

The cyberspace is populated with valuable information sources, expressed in about 1500 different languages and dialects. Yet, for the vast majority of WEB surfers this wealth of information is practically inaccessible or meaningless. Recent advancements in cross-lingual information retrieval, multilingual summarization, cross-lingual question answering and machine translation promise to narrow the linguistic gaps and lower the communication barriers between humans and/or software agents. Most of these language technologies are based on statistical machine learning techniques which require large volumes of cross lingual data. The most adequate type of cross-lingual data is represented by parallel corpora, collection of reciprocal translations. However, it is not easy to find enough parallel data for any language pair might be of interest. When required parallel data refers to specialized (narrow) domains, the scarcity of data becomes even more acute. Intelligent information extraction techniques from comparable corpora provide one of the possible answers to this lack of translation data.

Research Institute for Artificial Intelligence
Romanian Academy
13, "13 Septembrie", 050711, Bucharest 5, Romania
E-mail:

Fulltext

– 0.17 Mb

Svetlana Cojocaru (in honour of her 60th anniversary)
Languages and P Systems: Recent Developments
Self-Stabilization in Membrane Systems*
Usability in Scientific Databases
Yet Another Method for Image Segmentation based on Histograms and Heuristics*
Medical Image Registration by means of a Bio-Inspired Optimization Strategy
Computation of Difference Gröbner Bases
Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation
Comparison of indices of disproportionality in PR systems

Vladimir Andrunachievici Institute of Mathematics and Computer Science

Finding Translation Examples for Under-Resourced Language Pairs or for Narrow Domains; the Case for Machine Translation

Abstract

Fulltext

Contents