WebRequests
Resumen:
A dataset of labeled requests assembled from several public datasets, namely, Malicious-URLs, PKDD, and CSIC 2010 (also included). To merge the datasets, only the URI of each web request was used. To construct a feature vector to train the networks, each URI was tokenized in unigrams following a bag-of-words approach. For each URI, the values of the unigrams were computed using term frequency–inverse document frequency (TF–IDF). Each URI was represented by an l1-normalized vector composed of the 500 most frequent tokens across the entire dataset.
2020 | |
Agencia Nacional de Investigación e Innovación | |
Web requests Attack detection Ciencias Naturales y Exactas Ciencias de la Computación e Información |
|
Agencia Nacional de Investigación e Innovación | |
REDI | |
https://hdl.handle.net/20.500.12381/475 | |
Acceso abierto | |
Reconocimiento 4.0 Internacional. (CC BY) |
Sumario: | A dataset of labeled requests assembled from several public datasets, namely, Malicious-URLs, PKDD, and CSIC 2010 (also included). To merge the datasets, only the URI of each web request was used. To construct a feature vector to train the networks, each URI was tokenized in unigrams following a bag-of-words approach. For each URI, the values of the unigrams were computed using term frequency–inverse document frequency (TF–IDF). Each URI was represented by an l1-normalized vector composed of the 500 most frequent tokens across the entire dataset. |
---|