The SSIX Project Consortium recognises that data ethics and privacy issues need serious attention and to specifically address these matters. The Consortium has formed a Data and Business Ethics Board (DBEB), consisting of the project partners and independent ethics advisers. The board has produced a high-level ethics framework to address concerns with the objective of eliminating to highest level any issues while also constraining any overly adverse impact on the SSIX platform’s operation to a minimum.
The board has recommended that along with a Consent Manager on the website, this privacy page should be added explaining what data is being collected and what is being done with this data. By providing this transparency, the goal is to communicate with the public that the project is demonstrating strong ethical principles reassuring any concerns they may have.
What is the project's goal?
The SSIX projects aim is to provide European SME’s with a collection of easy to use tools to analyse and gauge the sentiment of social network users for any given topic; giving them valuable business intelligence which can be added to their decision-making process.
What data sources does the project use?
While SSIX is primarily focused on social networks like Twitter, the project will also use professional news feeds and blog content sources.
What does the project do with the collected data?
We strive to maintain the highest level of anonymity of all individual users in our work, only keeping data which is essential to the project's objectives. Once collected data is filtered to remove spam and irrelevant content, aggregated sentiment metrics will be produced by the SSIX NLP pipeline. We will destroy all personal data if it is no longer to be used for the project's purposes.
Is the data which the project collects public?
Yes, the SSIX platform will only be collecting data from publicly available sources, this means in principle all relevant authorisation and consent have been provided by the owners of data. The project will only access social network content from the official APIs, users on these networks have given consent to the network to share their data with third parties, additionally social networks like Twitter allow users to make their account private.
However, the Data and Business Ethics Board recognises that the interpretation of the privacy laws vary across the EU and that social network data which is public might be considered private even if the user has given consent to the social network to share their data. This legal grey area is a concern but it is not practical for the project to get a double opt-in from social networks users, as this would require the users to voluntarily opt-in to SSIX data collection or for SSIX to contact every single Twitter user requesting consent to use their data, no similar analytics service performs this double opt-in.
To address this issue SSIX has provided a Consent Manager on the project website, which allows the public to request a blind opt-out from SSIXs data collection as the user will have no way of knowing if their data has been collected. The user will need to submit certain details enabling the system to identify and remove collected content from that user. If a participant voluntarily gives access to their social network account ID number, either via the SSIX Consent Manager apps or by email, they are sending only their account ID to the SSIX DBEB. From the date of receipt, we will destroy all request communications. For a participants content to be removed from SSIX activities, the account ID is added to a static blacklist table, all incoming account matching this blacklist will be automatically discarded.
The Consent Manager can be found at https://data.ssix-project.eu/privacy
How is data stored in the SSIX Project?
We have implemented a Data Management Plan (DMP). The DMP describes the data management life-cycle for all the data sets that will be collected, processed or generated by the project. The DMP is not a fixed document and will evolve during the lifespan of the project.
What data will be shared by the SSIX Project?
All data released by the project will need the approval of the SSIX Data and Business Ethics Board (DBEB). We have no plans to share data gathered outside the SSIX project team. One exception might be the EU Open Research Data Pilot if the board agrees to do so.
Transmission of content to 3rd party services
In the NLP pipeline, the SSIX platform may make use of third-party systems during analysis. SSIX does not have the resources to cover every language with native classifiers and thus to perform analysis on unsupported languages the SSIX platform will use machine translation to convert from the unsupported language to one which is supported via a native classifier. Currency SSIX is using project partner Lionbridge's GeoFluent API http://www.lionbridge.com/geofluent/ to perform these translation tasks. During the process only tweet message content is sent out by SSIX.
Additionally, the SSIX platform may make use project partner Redlink's API suite http://dev.redlink.io/api/ to perform further analysis.
Summary of the data ethics issues identified within the scope of the SSIX platform activities:
- Privacy & Security: Collection and storage of data.
- SSIX will collect publicly available data from social networks, blogs and news providers.
- Where data are not publicly available or if specific authorisation is needed, relevant Informed Consent will be obtained before any collection, use or processing of relevant data.
- SSIX provides social network users a blind opt-out from data collection via the project website.
- Collected data will be sorted securely as outlined in the projects Data Management Plan.
- The project has agreed to Open Research Data Pilot (ORDP). However, the project will not share any collected data or analytics with any third party without approval of the DBEB.
- All collected data will be destroyed if it is no longer to be used for the project's purposes.
- Analytics: Creating aggregated insight from collected data.
- Certain identifiable user data needs to be stored for filtering and categorisation. The following details will not be used - name, address, age, sex, photos, date of birth, photos, metadata. Data Quality is of fundamental importance to the project outcomes so strong filtering rules are needed due to the prevalence of bots, fake data, spammers, etc.
- Data such as a user ID is required to remove spam messages. User profile description will be needed to classify some users into a domain category, such as a Twitter user who declare themselves as an Analyst or Trader.
- General data may be kept to categorise users by location, location detail would be no greater than a regional level.
- Business Ethics: How the aggregated metrics will be used.
- The end analytics from the platform are aggregated and do not target any one user.
- Group Data - only high-level domain grouping from professional fields.
- The project has no plans of identifying individuals or building up a profile on individuals user or aggregation of a user across multiple social networks, a group of individuals.
- The project does not want to do anything discriminating on any of the current sensitive grounds.
Where can I get more information?
You can contact the Data and Business Ethics Board with email@example.com