Translated poisson mixture model for stratification learning

Haro, Gloria - Randall, Gregory - Sapiro, Guillermo

Resumen:

A framework for the regularized and robust estimation of non-uniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of Translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The Translated Poisson distribution is useful to model a noisy counting process, and it is derived from the noise-induced translation of a regular Poisson distribution. By maximizing the log-likelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that the sequence of all possible local counting in a point cloud formed by samples of a stratification can be modeled by a mixture of different Translated Poisson distributions, thus allowing the presence of mixed dimensionality and densities in the same data set. With this statistical model, the parameters which best describe the data, estimated via expectation maximization, divide the points in different classes according to both dimensionality and density, together with an estimation of these quantities for each class. Theoretical asymptotic results for the model are presented as well. The presentation of the theoretical framework is complemented with artificial and real examples showing the importance of regularized stratification learning in high dimensional data analysis in general and computer vision and image analysis in particular.


Detalles Bibliográficos
2008
Manifold learning
Stratification learning
Clustering
Dimension estimation
Density estimation
Translated Poisson
Mixture models
Inglés
Universidad de la República
COLIBRI
https://hdl.handle.net/20.500.12008/38613
Acceso abierto
Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
Resumen:
Sumario:A framework for the regularized and robust estimation of non-uniform dimensionality and density in high dimensional noisy data is introduced in this work. This leads to learning stratifications, that is, mixture of manifolds representing different characteristics and complexities in the data set. The basic idea relies on modeling the high dimensional sample points as a process of Translated Poisson mixtures, with regularizing restrictions, leading to a model which includes the presence of noise. The Translated Poisson distribution is useful to model a noisy counting process, and it is derived from the noise-induced translation of a regular Poisson distribution. By maximizing the log-likelihood of the process counting the points falling into a local ball, we estimate the local dimension and density. We show that the sequence of all possible local counting in a point cloud formed by samples of a stratification can be modeled by a mixture of different Translated Poisson distributions, thus allowing the presence of mixed dimensionality and densities in the same data set. With this statistical model, the parameters which best describe the data, estimated via expectation maximization, divide the points in different classes according to both dimensionality and density, together with an estimation of these quantities for each class. Theoretical asymptotic results for the model are presented as well. The presentation of the theoretical framework is complemented with artificial and real examples showing the importance of regularized stratification learning in high dimensional data analysis in general and computer vision and image analysis in particular.