NotesFAQContact Us
Search Tips
ERIC Number: ED413746
Record Type: Non-Journal
Publication Date: 1995
Pages: 12
Abstractor: N/A
Reference Count: N/A
MULTEXT-EAST: Multilingual Text Tools and Corpora for Central and Eastern European Languages.
Erjavec, Tomaz; Ide, Nancy; Petkevic, Vladimir; Veronis, Jean
MULTEXT is a European Union project to identify and develop language resources, language-related software, and standards to make the resources maximally usable. MULTEXT-EAST is a spinoff project to develop significant resources for six Central and Eastern European (CEE) languages (Bulgarian, Czech, Estonian, Hungarian, Romanian, Slovenian) and adapt existing tools and standards to them. MULTEXT has developed a corpus encoding standard (CES), and MULTEXT-EAST is applying it to texts in the six languages. This has led to major revision of the CES, particularly to accommodate additional character sets. MULTEXT-EAST is building an annotated multilingual corpus composed of materials comparable to MULTEXT's, including: (1) at least 100,000 words of fiction and newspaper text in each of the CEE languages; (2) parallel translations of the same fictional text; and (3) a small corpus of spoken texts in each language. MULTEXT-EAST has adapted and extended MULTEXT language-dependent materials (lexicons, morphological rules, etc.) for its six languages. Guidelines for linguistic software development are also in progress. A list of participating organizations is appended. (MSE)
Publication Type: Reports - Descriptive; Speeches/Meeting Papers
Education Level: N/A
Audience: N/A
Language: English
Sponsor: N/A
Authoring Institution: N/A
Identifiers - Location: Europe; European Union