Deep learning for the analysis of network traffic measurements

Marín Freire, Gonzalo Miguel

Supervisor(es): Capdehourat, Germán - Casas, Pedro

Resumen:

The application of machine learning models to the analysis of network traffic measurements has largely increased in recent years. In the networking domain, shallow models are usually applied, where a set of expert handcrafted features are needed to fix the data before training. There are two main problems associated with this approach: firstly, it requires expert domain knowledge to select the input features, and secondly, different sets of custom-made input features are generally needed according to the specific target (e.g., network security, anomaly detection, traffic classification). On the other hand, the power of machine learning models using deep architectures (i.e., deep learning) for networking has not been yet highly explored. These models have had huge success in various domains, notably in computer vision, natural language processing, machine translation, and more recently in gaming. The main goal of this work is to explore the power of deep learning models to enhance the analysis of network tra c measurements. To this end, the specific problem of detection and classi cation of network attacks is studied. As a major advantage with respect to the state-of-the-art in the field, the evaluation of different raw-traffic input representations, including packet and ow-level ones, is considered. Different deep learning architectures are explored, including convolutional neural networks and long short-term memory recurrent neural networks as core layers. In addition, three different datasets are crafted from publicly available network traffic captures and used for calibrating the considered input representations, as well as training and validating the proposed models. Different deep learning models are compared to a random forest model - commonly accepted as a highly accurate model for network traffic analysis, using the same raw input representations. In the malware detection task, a detection accuracy of 77.6% and 98.5% was achieved for packet and ow input representations respectively. For the malware classification task, an overall accuracy of 76.5% was achieved. In all evaluation tasks, the proposed deep learning models outperform the random forest ones. These initial results suggest that deep learning can be used to enhance malware detection without requiring expert domain knowledge to handcraft input features, opening the door to a broad set of potential applications for deep learning in networking.


Detalles Bibliográficos
2019
Modelos de aprendizaje automático
Mediciones de tráfico de red
Arquitecturas de aprendizaje profundo
Inglés
Universidad de la República
COLIBRI
https://hdl.handle.net/20.500.12008/21770
Acceso abierto
Licencia Creative Commons Atribución – No Comercial – Sin Derivadas (CC-BY-NC-ND)
Resumen:
Sumario:The application of machine learning models to the analysis of network traffic measurements has largely increased in recent years. In the networking domain, shallow models are usually applied, where a set of expert handcrafted features are needed to fix the data before training. There are two main problems associated with this approach: firstly, it requires expert domain knowledge to select the input features, and secondly, different sets of custom-made input features are generally needed according to the specific target (e.g., network security, anomaly detection, traffic classification). On the other hand, the power of machine learning models using deep architectures (i.e., deep learning) for networking has not been yet highly explored. These models have had huge success in various domains, notably in computer vision, natural language processing, machine translation, and more recently in gaming. The main goal of this work is to explore the power of deep learning models to enhance the analysis of network tra c measurements. To this end, the specific problem of detection and classi cation of network attacks is studied. As a major advantage with respect to the state-of-the-art in the field, the evaluation of different raw-traffic input representations, including packet and ow-level ones, is considered. Different deep learning architectures are explored, including convolutional neural networks and long short-term memory recurrent neural networks as core layers. In addition, three different datasets are crafted from publicly available network traffic captures and used for calibrating the considered input representations, as well as training and validating the proposed models. Different deep learning models are compared to a random forest model - commonly accepted as a highly accurate model for network traffic analysis, using the same raw input representations. In the malware detection task, a detection accuracy of 77.6% and 98.5% was achieved for packet and ow input representations respectively. For the malware classification task, an overall accuracy of 76.5% was achieved. In all evaluation tasks, the proposed deep learning models outperform the random forest ones. These initial results suggest that deep learning can be used to enhance malware detection without requiring expert domain knowledge to handcraft input features, opening the door to a broad set of potential applications for deep learning in networking.