Publications

Papers

Social Sentiment Indices powered by X-Scores

Abstract:

Social Sentiment Indices powered by X-Scores (SSIX) seeks to address the challenge of extracting relevant and valuable economic signals in a cross-lingual fashion from the vast variety of and increasingly influential social media services; such as Twitter, Google+, Facebook, StockTwits and LinkedIn, and in conjunction with the most reliable and authoritative newswires, online newspapers, financial news networks, trade publications and blogs. A statistical framework of qualitative and quantitative parameters called X-Scores will power SSIX. This framework will interpret economically significant sentiment signals that are disseminated in the social ecosystem. Using X-Scores, SSIX will create commercially viable and exploitable social sentiment indices, regardless of language, locale and data format. SSIX and X-Scores will support research and investment decision making for European SMEs, enabling end users to analyse and leverage real-time social media sentiment data in their domain, creating innovative products and services to support revenue growth with focus on increased alpha generation for investment portfolios.

Download

Venue:

The Second International Conference on Big Data, Small Data, Linked Data and Open Data

ALLDATA 2016

February 21 - 25, 2016 - Lisbon, Portugal

https://www.iaria.org/conferences2016/AwardsALLDATA16.html

Award:

Best Paper Award

In or Out? Real-Time Monitoring of BREXIT sentiment on Twitter

Abstract:

The SSIX (Social Sentiment analysis financial IndeXes) project is a European Innovation Project sponsored by the European Commission under the Horizon 2020 framework. SSIX aims to provide European SMEs with a collection of easy to interpret tools to analyse and understand social media sentiment for any given topic regardless of locale or language. The United Kingdom’s recent referendum on European Union membership i.e. staying (“Bremain”) or leaving the EU (“Brexit”) was selected for the initial real-world test case for the validating the SSIX methodology and platform. In this paper, we describe the SSIX architecture in brief as well as analysis of the platforms X-Scores metrics and their application to Brexit, our initial experimental results and lessons learned.

Venue: 

SEMANTiCS 2016 (poster & demo track)

http://alt.qcri.org/semeval2017/index.php

A Twitter Sentiment Gold Standard for the Brexit Referendum

Abstract:

In this paper, we present a sentiment-annotated Twitter gold standard for the Brexit referendum. The data set consists of 2,000 Twitter messages (“tweets”) annotated with information about the sentiment expressed, the strength of the sentiment, and context dependence. This is a valuable resource for social media-based opinion mining in the context of political events.

Download

Venue: 

SEMANTiCS 2016 (poster & demo track)

http://ceur-ws.org/Vol-1695/

http://alt.qcri.org/semeval2017/index.php

 

Semantic Relation Classification: Task Formalisation and Refinement (shortened version)

Abstract:

The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded, allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.

Download

Venue:

Cogalex workshop @ COLING 2016 papers

http://coling2016.anlp.jp/#cfp

 

Fine-Grained Sentiment Analysis on Financial Microblogs and News

Abstract:

This paper discusses the “Fine-Grained Sentiment Analysis on Financial Mi- croblogs and News” task as part of SemEval-2017, specifically under the “Detecting sentiment, humour, and truth” theme. This task contains two tracks, where the first one concerns Microblog messages and the second one covers News Statements and Headlines. The main goal behind both tracks was to predict the sentiment score for each of the mentioned companies/stocks. The sentiment scores for each text instance adopted floating point values in the range of -1 (very negative/bearish) to 1 (very positive/bullish), with 0 designating neutral sentiment. This task attracted a total of 32 participants, with 25 participating in Track 1 and 29 in Track 2.

Download

Venue:

SemEval-2017 Shared task proposal

http://alt.qcri.org/semeval2017/index.php

 

A Case Study of Machine Translation in Financial Sentiment Analysis

Abstract:

The European research project Social Sentiment Indices powered by X-Scores (SSIX) intends to allow Small and Medium-sized Enterprises (SMEs) to take advantage of social media sentiment data for the finance domain. The project aims to overcome language barriers and realize a financial sentiment platform capable of scoring textual data in different languages. Our approach to achieve this goal takes maximum advantage of human translation while keeping costs low by incorporating machine translation. In the long run, we intend to provide a tool that helps SMEs to expand into new markets by analyzing multilingual social contents. In this paper, we investigate how sentiment is preserved after machine translation. We built a sentiment gold standard corpus in English annotated by native financial experts, and then we translated the gold standard corpus into a target corpus (German) using one human translator and three machine translation engines (Microsoft, Google, and Google Neural Network) which are integrated in Geofluent to allow pre-/post-processing. We then conducted two experiments. One meant to evaluate the overall translation quality using the BLEU algorithm. The other intended to investigate which machine translation engines produce translations that preserve sentiment best. Results suggest that sentiment transfer can be successful through machine translation if using Google and Google Neural Network in Geofluent. This is a crucial step towards achieving a multilingual sentiment platform in the domain of finance. Next, we plan to integrate language-specific processing rules to further enhance the performance of machine translation.

Download

Venue:

MT Summit XVI

http://aamt.info/app-def/S-102/mtsummit/2017/

Composite Semantic Relation Classification

Abstract:

Different semantic interpretation tasks such as text entailment and question answering require the classification of semantic relations between terms or entities within text. However, in most cases it is not possible to assign a direct semantic relation between entities/terms. This paper proposes an approach for composite semantic relation classification, extending the traditional semantic relation classification task. Different from existing approaches, which use machine learning models built over lexical and distributional word vector features, the proposed model uses the combination of a large commonsense knowledge base of binary relations, a distributional navigational algorithm and sequence classification to provide a solution for the composite semantic relation classification problem.

Download

Venue:

Natural Language and Data Bases (NLDB) 2017

http://nldb2017.conferences.hec.ulg.ac.be/

Multilingual Semantic Relatedness using lightweight machine translation

Abstract:

Distributional semantic models are strongly dependent on the size and the quality of the reference corpora, which embeds the commonsense knowledge necessary to build comprehensive models. While high-quality texts containing large- scale commonsense information are present in English, such as Wikipedia, other languages may lack sufficient textual support to build distributional models. This paper proposes using the combination of a lightweight (sloppy) machine translation model and an English Distributional Semantic Model (DSM) to provide higher quality word vectors for languages other than English. Results show that the lightweight MT model introduces significant improvements when compared to language-specific distributional models. Additionally, the lightweight MT outperforms more complex MT methods for the task of word-pair translation.

Download

Venue:

IEEE-ECSC2018

https://semanticcomputing.wixsite.com/icsc2018

SemR-11: A Multi-Lingual Gold-Standard for Semantic Similarity and Relatedness for Eleven Languages

Abstract:

This work describes SemR-11, a multi-lingual dataset for evaluating semantic similarity and relatedness for 11 languages (German, French, Russian, Italian, Dutch, Chinese, Portuguese, Swedish, Spanish, Arabic and Persian). Semantic similarity and relatedness gold standards have been initially used to support the evaluation of semantic distance measures in the context of linguistic and knowledge resources and distributional semantic models. SemR-11 builds upon the English gold-standards of Miller & Charles (MC), Rubenstein & Goodenough (RG), WordSimilarity 353 (WS-353), and Simlex-999, providing a canonical translation for them. The final dataset consists of 15,917 word pairs and can be used to support the construction and evaluation of semantic similarity/relatedness and distributional semantic models. As a case study, the SemR-11 test collections was used to investigate how different distributional semantic models built from corpora in different languages and with different sizes perform in computing semantic relatedness similarity and relatedness tasks.

Download

Venue:

LREC 2018

http://lrec-conf.org/lrec2018/lrec2018.htm

Truth or Lie ? Automatically fact checking news

Abstract:

In the actual scenario of ever-growing data consumption speed and quantity, factors like news source decentralization, citizen journalism and democratization of media, make the task of manually checking and correcting disinformation
across the internet impractical or infeasible . Here, there is an imperative need for a fast and reliable way to account for
the veracity of what is produced and spread as information: Automatic fact-checking. In this work we present the problem of fact-checking in the era of big data and post-truth. Some existing approaches for this task are presented and their main features discussed and compared. Concluding, a new approach inspired on the best components of the existing ones is presented

Download

Venue:

The Web Conference 2018

https://www2018.thewebconf.org

Word Tagging with Foundational Ontology Classes: Extending the WordNet-DOLCE Mapping to Verbs

Abstract:

Semantic annotation is fundamental to deal with large-scale lexical information, mapping the information to an enumerable set of categories over which rules and algorithms can be applied, and foundational ontology classes can be used as a formal set of categories for such tasks. A previous alignment between WordNet noun synsets and DOLCE provided a starting point for ontology-based annotation, but in NLP tasks verbs are also of substantial importance. This work presents an extension to the WordNet-DOLCE noun mapping, aligning verbs according to their links to nouns denoting perdurants, transferring to the verb the DOLCE class assigned to the noun that best represents that verb’s occurrence. To evaluate the usefulness of this resource, we implemented a foundational ontology-based semantic annotation framework, that assigns a high-level foundational category to each word or phrase in a text, and compared it to a similar annotation tool, obtaining an increase of 9.05 % in accuracy.

Download

Venue:

20th International Conference on Knowledge Engineering and Knowledge Management (EKAW), Bologna, Italy, 2016

http://ekaw2016.cs.unibo.it/

The SSIX Corpus: A Trilingual Gold Standard Corpus for Sentiment Analysis in Financial Microblogs

Abstract:

Abstract: This paper introduces the trilingual SSIX corpus for sentiment analysis. This corpus addresses the need to provide annotated data for supervised learning methods. It focuses on stock-market related messages extracted from a financial microblog platform. It includes 2,886 messages with opinion targets. These messages are provided with polarity annotation set on a continuous scale by three or four experts in each language. The annotation information identifies the targets and includes the text spans that substantiate the scores. The annotation process includes manual annotation verified and consolidated by experts. The creation of the annotated corpus took into account principled sampling strategies as well as the inter-annotator agreement before consolidation in order to maximize data quality.

Download

Venue:

LREC 2018

http://lrec-conf.org/lrec2018/lrec2018.htm

WWW’18 Open Challenge: Financial Opinion Mining and Question Answering

Abstract:

The growing maturity of Natural Language Processing (NLP) techniques and resources is dramatically changing the landscape of many application domains which are dependent on the analysis of unstructured data at scale. The finance domain, with its reliance on the interpretation of multiple unstructured and structured data sources and its demand for fast and comprehensive decision making is already emerging as a primary ground for the experimentation of NLP, Web Mining and Information Retrieval (IR) techniques for the automatic analysis of financial news and opinions online. This
challenge focuses on advancing the state-of-the-art of aspect-based sentiment analysis and opinion-based Question Answering for the financial domain.

Download

 

Venue:

WWW 2018 Conference

https://www2018.thewebconf.org/

Predicting sentiments in financial microblog messages in English: The SSIX financial classifier

Abstract:

The objective of this paper is to report on the building of a Sentiment Analysis (SA) system dedicated to financial microblog messages in three languages: English, Spanish, and German. The purpose of our work is to build a financial regression predictive model that predicts the sentiment of stock investors in microblog platforms such as StockTwits and Twitter. Our contribution shows that it is possible to conduct such tasks in order to provide fine-grained SA of financial microblogs. We extracted financial entities with relevant contexts and assigned scores on a continuous scale by adopting a deep learning method. Results show a 0.85 F1-Score on a two-class basis and a 0.62 cosine similarity score on a [-1;1] scale for English. In Spanish, we achieved 0.81 F1-Score and 0.45 cosine similarity, while in German F1-Score and cosine similarity were 0.84 and 0.64 respectively.

Download:

Paper accepted, download not available yet.

Venue:

TALN 2018

https://project.inria.fr/coriataln2018/

Special Reports

BIG DATA MEETS POLITICS

Abstract:

Politicians across Europe often look suspiciously at the “big data” revolution as a trend imported from the US, which encroaches on their privacy. But others are also surfing the wave and see a multitude of areas where big data analytics can support decision-making – and sometimes also help politicians win an election.

Download