NTL detection : Overview of classic and DNN-based approaches on a labeled dataset of 311k customers.

Massaferro Saquieres, Pablo - Di Martino, Matías - Fernández, Alicia

Resumen:

Non-technical losses (NLT) constitute a significant problem for developing countries and electric companies. The machine learning community has offered numerous countermeasures to mitigate the problem. Yet, one of the main bottlenecks consists of collecting and accessing labeled data to evaluate and compare the validity of proposed solutions. In collaboration with the Uruguayan power generation and distribution company UTE, we collected data and inspected 311k costumers, creating one of the world’s largest fully labeled datasets. In the present paper, we use this massive amount of information in two ways. First, we revisit previous work, compare, and validate earlier findings tested in much smaller and less diverse databases. Second, we compare and analyze novel deep neural network algorithms, which have been more recently adopted for preventing NLT. Our main discoveries are: (i) that above 80k training examples, the performance gain of adding more training data is marginal; (ii) if modern classifiers are adopted, handcrafting features from the consumption signal is unnecessary; (iii) complementary customer information as well as the geo-localization are relevant features, and complement the consumption signal; and (iv) adversarial attack ideas can be exploited to understand which are the main patterns that characterize fraudulent activities and typical consumption profiles.


Detalles Bibliográficos
2021
Training
Training data
Companies
Switches
Performance gain
Smart meters
Smart grids
Non-technical losses
Electricity theft
Automatic fraud detection
Inglés
Universidad de la República
COLIBRI
https://hdl.handle.net/20.500.12008/26892
Acceso abierto
Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0)
Resumen:
Sumario:Non-technical losses (NLT) constitute a significant problem for developing countries and electric companies. The machine learning community has offered numerous countermeasures to mitigate the problem. Yet, one of the main bottlenecks consists of collecting and accessing labeled data to evaluate and compare the validity of proposed solutions. In collaboration with the Uruguayan power generation and distribution company UTE, we collected data and inspected 311k costumers, creating one of the world’s largest fully labeled datasets. In the present paper, we use this massive amount of information in two ways. First, we revisit previous work, compare, and validate earlier findings tested in much smaller and less diverse databases. Second, we compare and analyze novel deep neural network algorithms, which have been more recently adopted for preventing NLT. Our main discoveries are: (i) that above 80k training examples, the performance gain of adding more training data is marginal; (ii) if modern classifiers are adopted, handcrafting features from the consumption signal is unnecessary; (iii) complementary customer information as well as the geo-localization are relevant features, and complement the consumption signal; and (iv) adversarial attack ideas can be exploited to understand which are the main patterns that characterize fraudulent activities and typical consumption profiles.