Resumen:: Toward interpretable polyphonic sound event detection with attention maps based on local prototypes :: SILO. Sistema nacional de repositorios digitales. Uruguay

Conferencia Publicado

Toward interpretable polyphonic sound event detection with attention maps based on local prototypes

Zinemanas, Pablo - Rocamora, Martín - Fonseca, Eduardo - Font, Frederic - Serra, Xavier

Resumen:

Understanding the reasons behind the predictions of deep neural networks is a pressing concern as it can be critical in several application scenarios. In this work, we present a novel interpretable model for polyphonic sound event detection. It tackles one of the limitations of our previous work, i.e. the difficulty to deal with a multi-label setting properly. The proposed architecture incorporates a prototype layer and an attention mechanism. The network learns a set of local prototypes in the latent space representing a patch in the input representation. Besides, it learns attention maps for positioning the local prototypes and reconstructing the latent space. Then, the predictions are solely based on the attention maps. Thus, the explanations provided are the attention maps and the corresponding local prototypes. Moreover, one can reconstruct the prototypes to the audio domain for inspection. The obtained results in urban sound event detection are comparable to that of two opaque baselines but with fewer parameters while offering interpretability.

Detalles Bibliográficos
Fecha de publicación:	2021
Temas:	Interpretability Sound event detection Prototypes
Idioma	Inglés
Institución:	Universidad de la República
Repositorio:	COLIBRI
Enlace(s):	http://dcase.community/workshop2021/proceedings http://dcase.community/workshop2021/ http://dcase.community/documents/workshop2021/proceedings/DCASE2021Workshop_Zinemanas_22.pdf https://hdl.handle.net/20.500.12008/29961
Nivel de acceso:	Acceso abierto
Licencia:	Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)

Resumen:
Sumario:	Understanding the reasons behind the predictions of deep neural networks is a pressing concern as it can be critical in several application scenarios. In this work, we present a novel interpretable model for polyphonic sound event detection. It tackles one of the limitations of our previous work, i.e. the difficulty to deal with a multi-label setting properly. The proposed architecture incorporates a prototype layer and an attention mechanism. The network learns a set of local prototypes in the latent space representing a patch in the input representation. Besides, it learns attention maps for positioning the local prototypes and reconstructing the latent space. Then, the predictions are solely based on the attention maps. Thus, the explanations provided are the attention maps and the corresponding local prototypes. Moreover, one can reconstruct the prototypes to the audio domain for inspection. The obtained results in urban sound event detection are comparable to that of two opaque baselines but with fewer parameters while offering interpretability.

Toward interpretable polyphonic sound event detection with attention maps based on local prototypes

Resultados similares