Skip to content

[Trainship] European Commission Joint Research Centre (JRC) – 1 traineeship in Entity and Sentential-level Sentiment Analysis in the News

The Text and Data Mining Unit of the European Commission’s Joint Research Centre (JRC) is looking to fill one traineeship position in the field of:

Entity and Sentential-level Sentiment Analysis in the News

If you are interested, please follow the instructions provided at http://recruitment.jrc.ec.europa.eu/?type=TR&site=IPR (Code: 2018-IPR-I-000-9856 – ISPRA).

JRC- Text and Data Mining Unit : https://ec.europa.eu/jrc/en/text-mining-and-analysis

JRC-EMM products:             http://emm.newsbrief.eu/overview.html

JRC-EMM Publications:         http://optima.jrc.it/Resources/JRC-EMM_Publications.pdf

 

DESCRIPTION OF THE FORESEEN ACTIVITY:

The JRC’s Europe Media Monitor (EMM) team carries out research and development in the field of highly multilingual text mining (Language Technology; Computational Linguistics) for the purposes of media monitoring. EMM gathers an average of 300,000 online news articles per day in over 70 languages and analyses them to help its large international user community understand and use this enormous amount of media information. The Europe Media Monitor EMM is publicly accessible and widely used. The EMM team has produced over 200 international peer-reviewed publications. The team has also produced and distributes a number of highly multilingual Language Technology resources.

The Text and Data Mining Unit (I3) of the European Commission’s Joint Research Centre (JRC) in Ispra, Italy, is looking for a trainee to support the JRC’s Europe Media Monitor(EMM) team in its effort to improve its Named Entity Recognition and Classification (NERC) tools, especially for multi-word entities such as organisation and event names. EMM gathers and analyses reports from traditional and social media in dozens of languages by clustering related news items; categorising them; extracting information such as entities (persons, organisations, locations), events (who did what to whom, where and when), quotations by and about people; identifying sentiment; as well as linking related news clusters over time and across languages. Methods used are mostly hybrid: machine learning tools are used to gather evidence, learn vocabulary and rules, but the results are usually controlled and optimised through human intervention. EMM is used by European Institutions, by national authorities in EU Member States, by international organisations and by the public. The public EMM applications NewsBriefNewsExplorer and MedISys can be accessed freely by the general public. EMM is part of the JRC’s Competence Centre on Text Mining and Analysis.

As of now, the EMM team has implemented several approaches to multilingual sentiment analysis, for different text types (newspaper articles, microblogs, social media posts) and application scenarios (document level, short texts, entity-centric). The successful trainee will help to combine the current approaches and resources and extend them when necessary to perform multilingual entity and sentence-level sentiment analysis and evaluate the system thus obtained. The trainee is also expected to contribute to writing a scientific publication on the work carried out.

REQUIRED QUALIFICATIONS:

Essential:

  •       a degree (or an almost completed degree) in computational linguistics, computer science or related areas;(Applications from students currently preparing a thesis for a University degree are eligible. The thesis should match with the subject of the project call).
  •          Java programming skills;
  •          good working knowledge of English. (B2 level)

Advantage:

  • Experience in methods and resources for sentiment analysis and emotion detection;
  • Knowledge of further foreign languages;
  • Good knowledge of Language Technologyrelated tools and methods;
  • Proven ability to work independently and as part of a team.