NTL detection : Overview of classic and DNN-based approaches on a labeled dataset of 311k customers.
Resumen:
Non-technical losses (NLT) constitute a significant problem for developing countries and electric companies. The machine learning community has offered numerous countermeasures to mitigate the problem. Yet, one of the main bottlenecks consists of collecting and accessing labeled data to evaluate and compare the validity of proposed solutions. In collaboration with the Uruguayan power generation and distribution company UTE, we collected data and inspected 311k costumers, creating one of the world’s largest fully labeled datasets. In the present paper, we use this massive amount of information in two ways. First, we revisit previous work, compare, and validate earlier findings tested in much smaller and less diverse databases. Second, we compare and analyze novel deep neural network algorithms, which have been more recently adopted for preventing NLT. Our main discoveries are: (i) that above 80k training examples, the performance gain of adding more training data is marginal; (ii) if modern classifiers are adopted, handcrafting features from the consumption signal is unnecessary; (iii) complementary customer information as well as the geo-localization are relevant features, and complement the consumption signal; and (iv) adversarial attack ideas can be exploited to understand which are the main patterns that characterize fraudulent activities and typical consumption profiles.
2021 | |
Training Training data Companies Switches Performance gain Smart meters Smart grids Non-technical losses Electricity theft Automatic fraud detection |
|
Inglés | |
Universidad de la República | |
COLIBRI | |
https://hdl.handle.net/20.500.12008/26892 | |
Acceso abierto | |
Licencia Creative Commons Atribución - No Comercial - Sin Derivadas (CC - By-NC-ND 4.0) |
Sumario: | Non-technical losses (NLT) constitute a significant problem for developing countries and electric companies. The machine learning community has offered numerous countermeasures to mitigate the problem. Yet, one of the main bottlenecks consists of collecting and accessing labeled data to evaluate and compare the validity of proposed solutions. In collaboration with the Uruguayan power generation and distribution company UTE, we collected data and inspected 311k costumers, creating one of the world’s largest fully labeled datasets. In the present paper, we use this massive amount of information in two ways. First, we revisit previous work, compare, and validate earlier findings tested in much smaller and less diverse databases. Second, we compare and analyze novel deep neural network algorithms, which have been more recently adopted for preventing NLT. Our main discoveries are: (i) that above 80k training examples, the performance gain of adding more training data is marginal; (ii) if modern classifiers are adopted, handcrafting features from the consumption signal is unnecessary; (iii) complementary customer information as well as the geo-localization are relevant features, and complement the consumption signal; and (iv) adversarial attack ideas can be exploited to understand which are the main patterns that characterize fraudulent activities and typical consumption profiles. |
---|