This paper describes the acquisition, preparation and properties of a corpus extracted from the official documents of the United Nations (UN). This corpus is available in all 6 official languages of the UN, consisting of around 300 million words per language. The authors describe the methods used for crawling, document formatting, and sentence alignment. This corpus also includes a common test set for machine translation. The article present the results of a French-Chinese machine translation experiment performed on this corpus.
Read the complete article here
HUMANTERM (UEM2012-09) is a research project, funded by the Universidad Europea de Madrid, whose aim is the compilation of a multilingual glossary in the area of humanitarian aid, using a wiki platform that allows for collaboration.
Read more on HUMANTERM