Anonymization
Introduction
Automatic Data Anonymisation, based in Natural Language Processing technologies, is the process of removing or replacing sensitive information from textual sources to protect individuals of being exposed. Moreover, it allows for analysing data while being compliant with the GDPR. Nowadays, this technology has improved its potential uses and results due to the advance of Deep Learning technologies.
Modules
Language Detection
Automatic text language detection
Data Detection
Automatic detection of pieces of text (entities) containing sensitive information
Data Classification
Automatic classification of detected entities into categories such as PERSON, LOCATION and so on
Anonymization
Obfuscation of sensitive entities by replacement for placeholders that can contain symbols (“XXX”), the sensitive data category, or words similar to the original
Markets
Medicine
To carry out the development of technological solutions and research in the field of medicine, it is essential to be able to share information that contains especially sensitive personal data
Legal
To analyze, detect and replace sensitive data in legal documents, such as court rulings, contributing to open-data and transparency
Public Administration
To promote the sharing of de-identified data without traceable personal details, making it GDPR compliant
Use Cases
MAPA
An European project
Development of a toolkit for effective and reliable anonymisation of texts in the medical, legal, and administrative fields in 24 languages. As a result, it will promote the feasibility of sharing de-identified data without traceable personal details, making it GDPR compliant.