LETEO: Scalable anonymization of big data and its application to learning analytics :: SILO. Sistema nacional de repositorios digitales. Uruguay

Reporte técnico Publicado

LETEO: Scalable anonymization of big data and its application to learning analytics

Giménez, Eduardo - Etcheverry, Lorena - Olmedo, Federico - Buil Aranda, Carlos - Toro, Matías - Pastorini, Marcos

Resumen:

Created in 2007, Plan Ceibal is an inclusion and equal opportunities plan with the aim of supporting Uruguayan educational policies with technology. Throughout these years, and within the framework of its tasks, Ceibal has an important amount of data related to the use of technology in education, necessary to manage the plan and fulfill the assigned legal tasks. However, the data does not they can be studied without accounting for the problem of de identifying the users of the Plan. To exploit this data, Ceibal has deployed an instance of the Hortonworks Data Platform (HDP), a open source platform for the storage and parallel processing of massive data (big data). HDP offers a wide range of functional components ranging from large file storage (HDFS) to distributed programming of machine learning algorithms (Apache Spark / MLlib). However, as of today there are no solutions for the de-identification of personal code data open and integrated into the Hortonworks ecosystem. On the one hand, the deidentification tools existing data have not been designed so that they can easily scale to large volumes of data, and they also do not offer easy integration mechanisms with HDFS. This forces you to export the data outside of the platform that stores them to be able to anonymize them, with the consequent risk of exposure of confidential information. On the other hand, the few integrated solutions in the Hortonworks ecosystem are owners and the cost of their licenses is very significant. The objective of this project is to promote the use of the enormous amount of educational and technological data that Ceibal possesses, lifting one of the greatest obstacles that exist for that, namely, the preservation of privacy and the protection of the personal data of the beneficiaries of the Plan. To this end, this project seeks to generate anonymization tools that extend the HDP platform. On In particular, it seeks to develop open source modules to integrate into said platform, which implement a set of programmed anonymization techniques and algorithms in a distributed manner using Apache Spark and that can be applied to data sets stored in HDFS files.

Detalles Bibliográficos
Fecha de publicación:	2021
Temas:	Anonymization Big data Learning analytics
Idioma	Español
Institución:	Universidad de la República
Repositorio:	COLIBRI
Enlace(s):	https://hdl.handle.net/20.500.12008/29755
Nivel de acceso:	Acceso abierto
Licencia:	Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)

Resultados similares

Big Data for All: Privacy and User Control in the Age of Analytics
Autor(es):: Tene, Omer
Fecha de publicación:: (2013)

Federated learning for data analytics in education
Autor(es):: Fachola, Christian
Fecha de publicación:: (2023)

Regulating 'Big Data Education' in Europe: Lessons Learned from the US
Autor(es):: Har Carmel, Yoni
Fecha de publicación:: (2016)

How can Plan Ceibal Land into the Age of Big Data?
Autor(es):: Bailón, Martina
Fecha de publicación:: (2015)

A Data Protection Framework for Learning Analytics
Autor(es):: Cormack, Andrew Nicholas
Fecha de publicación:: (2016)

Strategies for Data and Learning Analytics Informed National Education Policies: the Case of Uruguay
Autor(es):: Cobo, Cristóbal
Fecha de publicación:: (2017)

Analítica sobre Big Data
Autor(es):: Rodríguez Saredo, Juan Francisco
Fecha de publicación:: (2018)

Information consolidation architecture for health insurance using Big Data
Autor(es):: Zerega-Prado, José
Fecha de publicación:: (2022)

Catálogo de arquitecturas de software y tácticas arquitectónicas para contextos de big data
Autor(es):: Russo Ibañez, Juan Pablo
Fecha de publicación:: (2019)

Proyectos de big data en organizaciones actuales
Autor(es):: Saldías Mironenko, Juan Ignacio
Fecha de publicación:: (2021)

Large-scale internet user behavior analysis of a nationwide K-12 education network based on DNS queries
Autor(es):: Arriola, Alexis
Fecha de publicación:: (2020)

Mapeo sistemático y evaluación de arquitecturas de software para contextos de big data
Autor(es):: Russo Ibañez, Juan Pablo
Fecha de publicación:: (2018)

Learning analytics for the global south
Autor(es):: Gasevic, Dragan
Fecha de publicación:: (2018)

Code of practice for learning analytics
Autor(es):: Sclater, Niall
Fecha de publicación:: (2015)

Modelo de madurez para gobernanza de Big Data
Autor(es):: Armellino Russi, Pablo Martín
Fecha de publicación:: (2019)

Sensor data analysis and sensor management for crop monitoring.
Autor(es):: Sosa, Raquel
Fecha de publicación:: (2017)

Modelo para la evaluación de buenas prácticas de gobernanza de Big Data
Autor(es):: Carlos Airaudo, Ruben Darío
Fecha de publicación:: (2019)

Legal, Risk and Ethical Aspects of Analytics in Higher Education
Autor(es):: Kay, David
Fecha de publicación:: (2012)

Un mapeo de big data electoral en Uruguay: la oferta hacia los partidos
Autor(es):: Ferreira Toledo, Gonzalo
Fecha de publicación:: (2022)

One model to find them all deep learning for multivariate time-series anomaly detection in mobile network data
Autor(es):: García González, Gastón
Fecha de publicación:: (2023)

The end of mediations? The rejection of deference in the contemporary public sphere
Autor(es):: Kaufmann, Laurence
Fecha de publicación:: (2019)

Detección de anomalías en sistemas de telecomunicaciones mediante métodos de aprendizaje continuo
Autor(es):: Gómez, Gabriel
Fecha de publicación:: (2020)

Spirituality and Personality within the framework of The Big Five
Autor(es):: Lemos, Viviana
Fecha de publicación:: (2018)

Water-quality data imputation with a high percentage of missing values : A machine learning approach
Autor(es):: Rodríguez Núñez, Rafael
Fecha de publicación:: (2021)

A short analysis of BigColor for image colorization.
Autor(es):: García, Rosana
Fecha de publicación:: (2024)

Assessment of data augmentation techniques with synthetic images in uncommon datasets cases
Autor(es):: Repetto Ferrero, Andrés Mauricio
Fecha de publicación:: (2023)

Singularities for analytic continuations of holonomy germs of riccati foliations
Autor(es):: Álvarez, Sebastien
Fecha de publicación:: (2016)

Overcoming data scarcity in earth science.
Autor(es):: Gorgoglione, Angela
Fecha de publicación:: (2020)

Extracción y procesamiento de datos para modelado de trayectorias académicas en cursos universitarios
Autor(es):: Heredia, Matías
Fecha de publicación:: (2019)

Educational data science: Monitoring learning technologies in primary schools
Autor(es):: da Silva, Natalia
Fecha de publicación:: (2022)

Machine learning in healthcare toward early risk prediction: A case study of liver transplantation
Autor(es):: Chatterjee, Parag
Fecha de publicación:: (2020)

HADE : herramienta de análisis de datos educativos
Autor(es):: Ferrero, Tomás
Fecha de publicación:: (2018)

Estudio de la viabilidad de la aplicación de técnicas de aprendizaje automático a la educación
Autor(es):: Añón, Alejandro
Fecha de publicación:: (2021)

Analytics en el sector financiero uruguayo : de los datos al conocimiento.
Autor(es):: Romanelli, Valentín
Fecha de publicación:: (2020)

Assessment of yield gaps using field-level data in Uruguay. [Abstract].
Autor(es):: TSENG, M.C.
Fecha de publicación:: (2020)

Prueba de concepto del "framework" de "OpenMined" para modelos de "Machine Learning"
Autor(es):: Ampuero Velando, Pablo
Fecha de publicación:: (2021)

Análisis e implementación de técnicas de “Batch Reinforcement Learning” pasivo para aplicación sobre casos reales
Autor(es):: Derderian Dostourian, Mariana
Fecha de publicación:: (2021)

Implementación de un algoritmo de anonimización para la plataforma de datos masivos de Plan Ceibal
Autor(es):: Serra Oddo, Bruno
Fecha de publicación:: (2020)

NOVADOC
Autor(es):: Conde Vitureira, Gabriela Elizabeth
Fecha de publicación:: (2023)

Aprendizaje reforzado para la priorización de casos de prueba en el testing de regresión de los servicios de la API de Bantotal
Autor(es):: Alvarez Cernicchiaro, Gabriel Luis
Fecha de publicación:: (2021)