Sarcasm SIGN: Sarcasm interpretation corpus

We are happy to announce the release of the Sarcasm SIGN corpus: a parallel corpus of sarcastic tweets and their non-sarcastic interpretations, as created by human experts (3000 tweets annotated by their authors with the hashtag #sarcasm, 5 human translations per tweet). The corpus was created as part of the paper: Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation, Lotem Peled and Roi Reichart, ACL 2017 (https://arxiv.org/pdf/1704.06836.pdf)

The corpus and the project details can be found at: https://github.com/lotemp/SarcasmSIGN

Sarcasm SIGN dataset, a parallel corpus of sarcastic tweets and their non-sarcastic interpretations, as created by human experts. This corpus was created as part of our paper Sarcasm SIGN: Interpreting Sarcasm with Sentiment Based Monolingual Machine Translation which will be presented in ACL 2017. The repository contains two folders: “corpus” which contains the data files as well as the instructions for our human experts; and “preprocess” which contains code for preprocessing the data and preparing it for a MT system (see ReadMe in preprocess folder).