Skip to content

ELRA resources – 15 new corpora (written) & 7 updated corpora

We are happy to announce that a new set of 15 Written Corpora is now available in our catalogue.

Arabic-English, Arabic-French, Chinese-English and Chinese-French Written Parallel Corpora:
This set of 15 written corpora was produced by ELDA within PEA TRAD, a project supported by the French Ministry of Defence (DGA). Available resources are listed below (click on the links for further details).

  1. ELRA-W0098 TRAD Arabic-French Newspaper Parallel corpus – Test set 1 – ISLRN: 922-732-502-473-8 This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are articles collected in 2012 from the Arabic version of Le Monde Diplomatique. For more information, see: http://catalog.elra.info/product_info.php?products_id=1278
  2. ELRA-W0099 TRAD Arabic-English Newspaper Parallel corpus – Test set 1 – ISLRN: 764-187-795-074-0 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are articles collected in 2012 from the Arabic version of Le Monde Diplomatique. For more information, see: http://catalog.elra.info/product_info.php?products_id=1279
  3. ELRA-W0100 TRAD Arabic-French Newspaper Parallel corpus – Test set 2 – ISLRN: 722-323-886-920-3 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in French. The source texts are articles collected in May 2013 from the Arabic version of Le Monde Diplomatique.  For more information, see: http://catalog.elra.info/product_info.php?products_id=1280
  4. ELRA-W0101 TRAD Arabic-French Parallel corpus of transcribed Broadcast News Speech – ISLRN: 862-201-329-808-4 This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are transcriptions of broadcast news in Arabic recorded on France 24. For more information, see: http://catalog.elra.info/product_info.php?products_id=1281
  5. ELRA-W0102 TRAD Arabic-English Parallel corpus of transcribed Broadcast News Speech – ISLRN: 812-050-111-234-9 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are transcriptions of broadcast news in Arabic recorded on France 24. For more information, see: http://catalog.elra.info/product_info.php?products_id=1282
  6. ELRA-W0103 TRAD Arabic-French Web domain (blogs) Parallel corpus – ISLRN: 138-395-895-757-7 This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are blog articles from 2008 to 2013. For more information, see: http://catalog.elra.info/product_info.php?products_id=1283
  7. ELRA-W0104 TRAD Arabic-English Web domain (blogs) Parallel corpus – ISLRN: 762-161-069-435-5 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are blog articles from 2008 to 2013. For more information, see: http://catalog.elra.info/product_info.php?products_id=1284
  8. ELRA-W0105 TRAD Arabic-French Mailing lists Parallel corpus – Test set – ISLRN: 895-850-015-188-4 This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. Emails are dated from 2010 to 2012. For more information, see: http://catalog.elra.info/product_info.php?products_id=1285
  9. ELRA-W0106 TRAD Arabic-English Mailing lists Parallel corpus – Test set – ISLRN: 858-529-510-480-2 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. Emails are dated from 2010 to 2012. For more information, see: http://catalog.elra.info/product_info.php?products_id=1286
  10. ELRA-W0107 TRAD Arabic-French Mailing lists Parallel corpus – Development set  – ISLRN: 333-026-450-858-0 This is a parallel corpus of 10,000 words in Arabic and a reference translation in French. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. The collected emails are dated from 2004 to 2007. For more information, see: http://catalog.elra.info/product_info.php?products_id=1287
  11. ELRA-W0108 TRAD Arabic-English Mailing lists Parallel corpus – Development set – ISLRN: 213-044-240-074-6 This is a parallel corpus of 10,000 words in Arabic and a reference translation in English. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. The collected emails are dated from 2004 to 2007. For more information, see: http://catalog.elra.info/product_info.php?products_id=1288
  12. ELRA-W0109 TRAD Chinese-French Web domain (blogs) Parallel corpus – ISLRN: 464-017-697-777-3 This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in French. The source texts are blog articles dealing with various subjects such as economy, environment, society, technologies, etc. Articles are dated from June 2013. For more information, see: http://catalog.elra.info/product_info.php?products_id=1289
  13. ELRA-W0110 TRAD Chinese-English Web domain (blogs) Parallel corpus – ISLRN: 982-341-079-331-4 This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in English. The source texts are blog articles dealing with various subjects such as economy, environment, society, technologies, etc. Articles are dated from June 2013. For more information, see: http://catalog.elra.info/product_info.php?products_id=1290
  14. ELRA-W0111 TRAD Chinese-French News Articles Parallel corpus – ISLRN: 153-566-144-442-2 This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in French. The source texts are newspaper articles from the Chinese version of Voice of America. Articles are dated from 2011 and 2012. For more information, see: http://catalog.elra.info/product_info.php?products_id=1291
  15. ELRA-W0112 TRAD Chinese-English News Articles Parallel corpus – ISLRN: 626-096-751-907-7 This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in English. The source texts are newspaper articles from the Chinese version of Voice of America. Articles are dated from 2011 and 2012. For more information, see: http://catalog.elra.info/product_info.php?products_id=1292

    ___________________________________________

 

Previous releases from the same project were related to Pashto language and are listed below:

For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org. If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.