We are happy to announce that a new set of 15 Written Corpora is now available in our catalogue.
Arabic-English, Arabic-French, Chinese-English and Chinese-French Written Parallel Corpora:
This set of 15 written corpora was produced by ELDA within PEA TRAD, a project supported by the French Ministry of Defence (DGA). Available resources are listed below (click on the links for further details).
- ELRA-W0098 TRAD Arabic-French Newspaper Parallel corpus – Test set 1 – ISLRN: 922-732-502-473-8 This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are articles collected in 2012 from the Arabic version of Le Monde Diplomatique. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1278 - ELRA-W0099 TRAD Arabic-English Newspaper Parallel corpus – Test set 1 – ISLRN: 764-187-795-074-0 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are articles collected in 2012 from the Arabic version of Le Monde Diplomatique. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1279 - ELRA-W0100 TRAD Arabic-French Newspaper Parallel corpus – Test set 2 – ISLRN: 722-323-886-920-3 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in French. The source texts are articles collected in May 2013 from the Arabic version of Le Monde Diplomatique. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1280 - ELRA-W0101 TRAD Arabic-French Parallel corpus of transcribed Broadcast News Speech – ISLRN: 862-201-329-808-4 This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are transcriptions of broadcast news in Arabic recorded on France 24. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1281 - ELRA-W0102 TRAD Arabic-English Parallel corpus of transcribed Broadcast News Speech – ISLRN: 812-050-111-234-9 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are transcriptions of broadcast news in Arabic recorded on France 24. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1282 - ELRA-W0103 TRAD Arabic-French Web domain (blogs) Parallel corpus – ISLRN: 138-395-895-757-7 This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are blog articles from 2008 to 2013. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1283 - ELRA-W0104 TRAD Arabic-English Web domain (blogs) Parallel corpus – ISLRN: 762-161-069-435-5 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are blog articles from 2008 to 2013. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1284 - ELRA-W0105 TRAD Arabic-French Mailing lists Parallel corpus – Test set – ISLRN: 895-850-015-188-4 This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. Emails are dated from 2010 to 2012. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1285 - ELRA-W0106 TRAD Arabic-English Mailing lists Parallel corpus – Test set – ISLRN: 858-529-510-480-2 This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. Emails are dated from 2010 to 2012. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1286 - ELRA-W0107 TRAD Arabic-French Mailing lists Parallel corpus – Development set – ISLRN: 333-026-450-858-0 This is a parallel corpus of 10,000 words in Arabic and a reference translation in French. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. The collected emails are dated from 2004 to 2007. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1287 - ELRA-W0108 TRAD Arabic-English Mailing lists Parallel corpus – Development set – ISLRN: 213-044-240-074-6 This is a parallel corpus of 10,000 words in Arabic and a reference translation in English. The source texts are emails collected from Wikiar-I, a mailing list for discussions about the Arabic Wikipedia. The collected emails are dated from 2004 to 2007. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1288 - ELRA-W0109 TRAD Chinese-French Web domain (blogs) Parallel corpus – ISLRN: 464-017-697-777-3 This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in French. The source texts are blog articles dealing with various subjects such as economy, environment, society, technologies, etc. Articles are dated from June 2013. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1289 - ELRA-W0110 TRAD Chinese-English Web domain (blogs) Parallel corpus – ISLRN: 982-341-079-331-4 This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in English. The source texts are blog articles dealing with various subjects such as economy, environment, society, technologies, etc. Articles are dated from June 2013. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1290 - ELRA-W0111 TRAD Chinese-French News Articles Parallel corpus – ISLRN: 153-566-144-442-2 This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in French. The source texts are newspaper articles from the Chinese version of Voice of America. Articles are dated from 2011 and 2012. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1291 - ELRA-W0112 TRAD Chinese-English News Articles Parallel corpus – ISLRN: 626-096-751-907-7 This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in English. The source texts are newspaper articles from the Chinese version of Voice of America. Articles are dated from 2011 and 2012. For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1292
___________________________________________
Previous releases from the same project were related to Pashto language and are listed below:
- ELRA-S0381 TRAD Pashto Broadcast News Speech Corpus – ISLRN: 918-508-885-913-7 For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1265 - ELRA-W0092 TRAD Pashto Monolingual text Corpus – ISLRN: 394-903-293-388-0 For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1266 - ELRA-W0093 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech – Training data – ISLRN: 802-643-297-429-4 For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1267 - ELRA-W0094 TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech – Test data – ISLRN: 547-897-479-723-3 For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1268 - ELRA-W0095 TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech – Test data – ISLRN: 006-102-605-738-4 For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1269 - ELRA-W0096 TRAD Pashto-French News Articles Parallel corpus – ISLRN: 649-628-149-051-7 For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1270 - ELRA-W0097 TRAD Pashto-English News Articles Parallel corpus – ISLRN: 612-936-517-010-2 For more information, see: http://catalog.elra.info/
product_info.php?products_id= 1271
For more information on the catalogue, please contact Valérie Mapelli mailto:mapelli@elda.org. If you would like to enquire about having your resources distributed by ELRA, please do not hesitate to contact us.
- Visit our On-line Catalogue: http://catalog.elra.info
- Visit the Universal Catalogue: http://universal.elra.info
- Archives of ELRA Language Resources Catalogue Updates: http://www.elra.info/en/
catalogues/language-resources- announcements/