Dealing with Big Data and its challenges

The SSIX consortium has been invested in the Big Data paradigm since the very first moment of the project’s life. Collect, process, analyze, store and extract high amounts of data coming from variegated sources around the web, with a nearly-real-time approach: what does this mean?

There are multiple aspects to take into account when developing a so-called “Big Data” architecture.
First of all it is necessary to understand what “Big Data” really means: everyone is talking about it, but certainly only a few really understand the meaning and the difficulties related to this new scenario.
As Wikipedia states, “Big Data is a broad term for data sets so large or complex that traditional data processing applications are inadequate”. This means that Big Data is not only about the size of databases, but brings many challenges that are impossible to face with the traditional available technologies. These challenges are usually described with the Gartner’s definition of 3Vs:

  • Volume: the quantity of generated (or collected) data grows quickly, requiring the adoption of adequate storage technologies and resources;
  • Velocity: the data grow with high-speed and this characteristic requires that the procedures processing the incoming data are able to quickly analyze and manipulate the information, using parallel computing techniques and distributed systems;
  • Variety: the nature of the data processed and stored can be highly diverse, thus requiring the ability to deal with different data formats and structures. It is also fundamental to systematically identify the best analyses that can be performed on the available information, since the data extraction can strongly impact on resources and computing power.

Open source technologies come into help with a set of new useful languages and tools, especially those under the Apache Software Foundation: Spark, Kafka, Zookeeper, Cassandra are just some of the technologies used inside our architecture. Since they are not traditional solutions, these technologies require different approaches and involve innovative paradigms like parallel computing, distributed architectures, non-relational databases or scalable environments.

3rdPLACE has recently finished developing the procedures dedicated to data ingestion and analysis, aimed to collect, clean, analyze and store real time data from Twitter and News & Blog sources.
The integration of these components within the 3rdPLACE’s Big Data platform enables the processes of data-providing for the SSIX architecture.
In the next months additional data sources will be identified and connected to SSIX.

The commitment of 3rdPLACE towards Big Data technologies applied to the Financial sector is confirmed by a recent agreement with Borsa Italiana and London Stock Exchange, that just launched the Elite Connect project, a new business social network for investors and public companies empowered by 3rdPLACE’s expertise and its innovative Big Data platform, named FinScience.

 

This blog post was  written by SSIX partners at 3rd Place

For more information on SSIX, visit our website ssix-project.eu.

For the latest update, like us on Facebook, follow us on Twitter and join us on LinkedIn.