We are happy to announce the release of the Sarcasm SIGN corpus: a parallel corpus of sarcastic tweets and their non-sarcastic interpretations, as created by human experts (3000 tweets annotated by their authors with the hashtag #sarcasm, 5 human translations per tweet). The corpus was created as part of the paper: Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation, Lotem Peled and Roi Reichart, ACL 2017 (https://arxiv.org/pdf/1704.06836.pdf) The corpus and the project details can be found at: https://github.com/lotemp/SarcasmSIGN Sarcasm SIGN dataset, a parallel corpus of sarcastic tweets and their non-sarcastic…
Category: Corpora
ELRA resources – 15 new corpora (written) & 7 updated corpora
We are happy to announce that a new set of 15 Written Corpora is now available in our catalogue. Arabic-English, Arabic-French, Chinese-English and Chinese-French Written Parallel Corpora: This set of 15 written corpora was produced by ELDA within PEA TRAD, a project supported by the French Ministry of Defence (DGA). Available resources are listed below (click on the links for further details). ELRA-W0098 TRAD Arabic-French Newspaper Parallel corpus – Test set 1 – ISLRN: 922-732-502-473-8 This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The…
[Resource] Corpus ‘Australia 2015/2016’
The corpus ‘Australia 2015/2016’ includes all articles from major Australian newspapers published from August 2015 to July 2016 that include the key term ‘Australia’ or ‘Australian(s)’ in the title. Altogether, the corpus contains over 7 million tokens in almost 13,000 articles from 18 newspapers. The corpus thus reflects one year of printed media coverage of topics directly relevant to Australia. Download Australia2015/2016 Corpus here Download word frequencies from this corpus here
[CfP] 9th annual international conference on corpus linguistics, Paris May 31-June 2, 2017
The Spanish association for corpus linguistics is holding the 9th annual international conference on corpus linguistics in Paris May 31-June 2 2017. https://cilc2017.sciencesconf.org/ As part of AELINCO’s on-going programme of research activities and annual conferences, the broad aim of the CILC conferences is to provide language researchers an opportunity to present and communicate their work from a variety of corpus analysis perspectives, that is to say any research which attempts to account for attested language phenomena on the basis of empirical textual data. For CILC17, it has been decided that…
[Volunteers needed] Similes’ annotation, 19th and 20th century prose
Dear all, We are currently looking for volunteers to annotate similes and literal comparisons taken from a corpus of 19th and 20th century prose poems. To participate, please visit our online platform: http://dissimilitudes.lip6.fr:8181 Thank you in advance and do not hesitate to share the link with interested colleagues and students. Regards, Suzanne Mpouli PhD Student Laboratory of Computer Science Paris VI Université Pierre et Marie Curie