Automatic identification of power quality events using a machine learning approach

A. F. Valencia-Duque; A. M. Álvarez Meza; A. A. Orozco-Gutiérrez

Eléctrica

Received: 25 January 2019

Accepted: 10 June 2019

Abstract: In nowadays, Power Quality (PQ) events have been studied because they represent an essential aspect for the industries concerning the efficiency and the useful life of the elements connected to electrical systems. If the disturbances related to PQ events are classified (identified) fast and with reliable accuracy, the costs and losses generated would be reduced. In this paper, we present a machine learning-based approach to identify PQ events. Our proposal comprises the following stages: we employ a feature representation space based on time and frequency parameters. Besides, we include a supervised relevance analysis technique, called Relieff, to highlight the discriminant capability of the considered features. Then, we evaluate the success of classifying PQ events with different classifiers by adding different levels of noise under a cross-validation scheme. For concrete testing, a synthetic database based on the IEEE 1159 standard is generated, considering 3000 signals and ten classes (300 samples per class). Remarkably, obtained results show a suitable classification performance holding straightforward classifiers, e.g., quadratic and k-NN, in comparison to those state-of-the-art methodologies.

Keywords: machine learning, power quality, relevance analysis, time-domain features, frequency-domain features.

Resumen: Actualmente, los eventos de calidad de potencia (PQ) se han estudiado dado su importancia para las industrias, en cuanto a la eficiencia y la vida útil de los elementos conectados a los sistemas eléctricos. Si las perturbaciones relacionadas con los eventos de PQ se clasifican (identifican) rápidamente y con una precisión confiable, los costos y las pérdidas generadas se reducirían. En este trabajo presentamos un enfoque basado en aprendizaje de máquina para la identificación automática de eventos PQ. Nuestra propuesta comprende las siguientes etapas: empleamos un espacio de representación de características basado en parámetros de tiempo y frecuencia. Además, utilizamos una técnica de análisis de relevancia supervisada, llamada Relieff, para resaltar la capacidad discriminante de las características consideradas. Luego, evaluamos el éxito de la clasificación de eventos PQ con diferentes clasificadores agregando diferentes niveles de ruido bajo un esquema de validación cruzada de 10 particiones. En este sentido, se genera una base de datos sintética basada en el estándar IEEE 1159, considerando 3000 señales y diez clases (300 muestras por clase). Los resultados obtenidos muestran un rendimiento de clasificación adecuado con clasificadores simples, cuadrático y k-NN, en comparación con las metodologías más avanzadas del estado del arte.

Palabras clave: aprendizaje de máquina, características dominio temporal, características dominio de la frecuencia, análisis de relevancia., calidad de energía.

I.INTRODUCTION

The analysis of power quality (PQ) signals is a subject of total interest in the efficiency and commercialization of electric power. Such an analysis defines the transmission quality, increasing both the efficiency and the useful life of the elements that are connected to the electrical system. Overall, there are several types of disturbances (events) related to PQ, namely, root-mean-square voltage variations (RMS), harmonics, flickers, notches, and transient effects [1, 2]. The main aspects that contribute to the occurrence of PQ disturbances includes the ignition of large electric motors, the switching of capacitor banks, the non-linear loads, the operation of an electric arc furnace, and the faults in distribution systems [3]. Among other side effects, disturbances cause heating, malfunction, and reduction of the useful life of electric devices. Particularly, non-sinusoidal currents significantly increase losses in conductors and transformers [5]. Consequently, PQ events cause to the electricity sector losses between 26.000 and 400.000 million dollars yearly in the US [4]. Currently, there are devices devoted to monitoring electrical networks to find faults or artifacts that affect its operation. However, due to the huge amount of recorded data, it is not practical to study it manually.

Therefore, the use of machine learning approaches arises as an alternative to build automatic monitoring systems from PQ data. Thus, the operators would execute immediate or medium-term actions to find the causes that respond to the failures. Significant information can be extracted from time parameters; such a representation strategy is not enough to separate properly among different PQ events. On the other hand, frequency features based on the well-known Fourier transform are also deemed to estimate harmonic patterns. Also, some approaches employ both time and frequency-based characteristics to classify PQ events [23 -25].

Though time and frequency properties can be extracted, the parameters computed encode high dimensional feature spaces, which can cause overfitting in further classification stages, not mentioning the high computational burden. Traditionally, state-of-the-art works employ sophisticated classifiers, e.g., support vector machines (SVM) and neural networks (NN), to discriminate PQ events [13-19]. Nonetheless, the use of complex classifiers can lead to overfitting and lack of suitable interpretability regarding the most relevant features. Hence, dimensional reduction must be carried out in order to extract the most important characteristics without losing relevant information [8,9].

In this paper, we propose a machine learning methodology aimed at the automatic identification of PQ events. For this purpose, we use two characterization spaces: statistical parameters in time and frequency domains. Also, a feature selection algorithm based on the supervised Relieff algorithm is used to reduce the representation space. So, our approach favors the separability between different types of disturbance in later stages of classification. Here, a synthetic database based on the IEEE 1159 standard is generated, considering 3000 signals and ten classes (300 samples per class). The results show that holding the most four relevant features, our methodology gets an accuracy close to 98% with straightforward classifiers, e.g., quadratic and k-NN.

The rest of the document is organized as follows. In section I, we explain the main methods of our approach. In section II we describe de experimental set-up. In section III the results are shown and discussed. Finally, in section IV we conclude about the tests performed, naming the bonds and the limitations of the proposed methodology.

II. Methods

A. Power Quality Fundamentals

The term Power or Power Quality (PQ) is described according to IEEE Standard 1159 as: "The concept of powering and grounding electronic equipment in a manner that is suitable for its operation and compatible with the premise wiring system and other connected equipment" [3]. Also, it is understood as the way to measure the differences between the signal that a charge is receiving and the one that it should ideally receive. These differences are called disturbances. Then, a voltage or current signal with perfect sinusoidal waveform has the best power quality, any variation (either in magnitude or frequency) is defined as a disturbance. Two leading causes generate disturbances reducing the power quality: i) connection of loads to the network and ii) problems in the systems or subsystems of transmission and distribution [1]. Besides, disturbances can be classified according to the distortion caused on the wave: amplitude-based distortions (swells, sags, and outage), harmonic distortion, notches, flicker, and transient effects (oscillatory and impulsive) [8].

B. PQ Event Classification

Feature extraction. Let be a data set holding N samples related to PQ events, where represents a given signal at T time instants and codes the disturbance label, being C the total number of events studied. To estimate time domain-based features, we compute the following statistical parameters in Table 1: mean, RMS, standard deviation, kurtosis, skewness, maximum value, and minimum value.

Moreover, we apply the well-known Fourier Transform over z aiming to extract harmonic patterns. Thus, for each provided signal in time domain, we obtain a vector representing the frequency spectrum as ,where . Also, we can define as a frequency index vector, where and is the sampling frequency ( ). Next, the following parameters are computed over s as explained in Table 1: mean value, RMS value, standard deviation, skewness, kurtosis, maximum value, and THD. Once all the parameters are computed for all provided samples, a feature set is built after concatenation of the time and frequency-based features, where and Q is the number of features extracted.

TABLE I
FEATURES AND ITS EQUATIONS

Statistical parameters for time and frequency domain-based features devoted to PQ event classification. $r \in ℝ^{L}$ can be either the time domain signal z or the frequency vector s.

* For the total harmonic distortion (THD), the power P_k must be calculated for the first K’ harmonics.

Autor

Relevance analysis. To reveal discriminant features and find a balance between classification complexity and accuracy [20], we compute the contribution of each feature concerning the label vector . Therefore, we calculate the relevance vector $υ \in ℝ^{p}$ based on the Relieff approach as follows (1) [21]:

(1)

where is a given distance function, e.g., the Euclidean, holds the -nearest neighbors of according to d, and is the probability that a sample belongs to the c-th class . Hence, the higher the value the better the q-th feature for discriminating PQ events [22].

III. Experimental Setup

A. Dataset

Aiming to test our machine learning approach to identify PQ events, we generate a synthetic database following the models described in [9-13]. Thereupon, the following classes are studied: 1) Sag, 2) Swell, 3) Outage, 4) Impulsive transient, 5) Notch, 6) Spike, 7) Harmonics, 8) Oscillatory transient, 9) Flicker, and 10) Normal (pure sin wave). For each type of disturbance, 300 signals were generated, which provides a database of 3000 observations. The free parameters of each type of disturbance were chosen randomly for each signal generated in the range defined in Table II. We fix a sampling frequency equal to 18 kHz and a time length of 0.166 seconds that corresponds to ten cycles. Further, to evaluate the stability of the method, the experiments are performed by adding white Gaussian noise at 10dB, 20dB, 30dB, and 40dB [18] [14] [16]. Fig. 1 shows some PQ events of the studied dataset.

B. Training and comparison methods

Regarding the feature extraction stage, the fast Fourier transform (FFT) is applied using the algorithm known as fft() implemented in Matlab©. The single-sided Fourier spectrum is obtained for frequencies up to 1 kHz. For the sake of clarity, we index the features extracted in time domain as follows: THD (1), mean (2), RMS (3), RMS_1 (4), RMS_2 (5), RMS_3 (6), standard deviation (7), kurtosis (8), asymmetry (9), maximum value (10), and minimum value (11). Features 4, 5, and 6 correspond to the RMS values calculated along the input signal fixing a window size equal to 55 ms, which is equivalent to one-third of the total duration of the signal. Now, in the frequency domain, we compute the following parameters: mean (12), RMS (13), standard deviation (14), asymmetry (15), kurtosis (16), and the maximum value (17). Consequently, we built a feature space holding 17 features, 10 classes and 3000 observations, which results in a matrix with N = 3000 and Q = 17.

Concerning the relevance analysis stage, the Relieff algorithm is used by fixing the number of nearest neighbors to 1. Moreover, we test our methodology with the following classifiers: Linear, KNN, quadratic, and SVM. A cross-validation scheme with 30 repetitions is used, randomly choosing as the training set the 70% of the data. The classification accuracy is obtained as the average of the repeats. For the linear and quadratic classifiers, we set a pseudo-linear covariance matrix. In the case of the KNN, we utilize a nested cross-validation strategy to find the number of neighbors from the set 1, 3, 5, 7, 9, 11, 13, and 15 concerning the testing set accuracy. For the SVM, we fix a Gaussian kernel, and we also use a nested cross-validation scheme to find the bandwidth and the regularization parameters from the sets 0.01, 0.1, 1 and 10 and 0.01, 0.1, 1, 10, 100, 500 and 1000, respectively.

We compare the results obtained with the following state-of-the-art methods: wavelet transform and self-organized learning matrix [15], discrete wavelet transform and wavelet networks [17], filtering techniques based on empirical mode decomposition (EMD) and Hilbert transform, and fuzzy logic-based classification [18], variational mode decomposition (VMD) and SVM [14], double-S-transform with directed acyclic SVM [16], and fractional Fourier transform-based feature extraction [19].

IV. Result and Discussion

As seen in Fig. 2, the frequency spectrum of the examples signals in Fig. 1 shows that some harmonic patterns would be useful to discriminate between PQ events. After visual inspection, the amplitude of the spectrum within the ranges 0 -180 Hz and 240 – 480 Hz can distinguish some of the disturbances. Nonetheless, to quantify the dependencies among the 17 time and frequency-based parameters, Fig. 3 displays the correlation matrix of the input feature space. As seen, the RMS values of the signal in the time domain (features 3, 4, 5 and 6) are closely related to each other and all the characteristics in the frequency domain. The above can be explained because variations in frequency that affect the PQ signals generate changes in the amplitude, due to the harmonics introduced increase the RMS value of the signal. In turn, we calculate the matrix of distances among samples to inspect the inter-class overlapping visually. Fig. 4 displays the distance matrix and it is noted that samples from 1 to 1200 are separated from each other and with the rest of the examples. It is essential to identify the proximity between samples 600-900 (class 3) and 1-300 (class 1) which is generated by the similarity between these types of disturbances. In Fig. 5 we present the normalized relevance value (0-1) of each feature according to the Relieff algorithm. Following, the four most relevant features are listed: rms_1 in time domain (4), skewness in time domain (9), standard deviation in frequency domain (14), and skewness in frequency domain (15).

TABLE II
PQ Disturbances

PQ disturbances studied with their respective mathematical model and settable parameters as seen in [26].

Autor

Fig 1.
Some examples of the PQ events in the studied dataset.
Autor

Fig 2.
Frequency spectrum of the sample signals in Fig. 1.
Autor

Fig 3.
Correlation matrix between computed features for PQ events identification (see the feature number explanation in the Experimental set-up section).
Autor

Fig 4.
Distance matrix among samples sorted according to the PQ labels.
Autor

Fig 5.
Normalized relevance values for all considered features in PQ event classification. Blue: time-domain features, Red: frequency domain features.
Autor

Further, the classification results can be seen in Fig. 6. Here, we calculated the testing set accuracy by adding one by one the features ranked concerning the relevance values in Fig. 5. As seen the quadratic and the KNN classifiers exhibit the highest performances holding the most 12 and 13 features, respectively. Notably, a feature space keeping only the four most relevant features achieves a suitable performance (see the squares in Fig. 6). The confusion matrix presented in Fig. 7 emphasizes the discrimination capability of the relevant subset of features. The classification results reveal a slight overlap between classes 1 and 3, which is quite logical because the Sag and Outage disturbances share similar wave shapes. Furthermore, both kinds of PQ events consist of the reduction of the RMS value of the signal for a short time.

Fig 6.
Number of relevant features versus classification accuracy. The diamonds identify the best result considering only the accuracy, while the squares highlight the trade-off between number of relevant features and classification accuracy.
Autor

Fig 7.
Confusion matrix obtained when performing the classification over the four most relevant features (quadratic classifier results).
Autor

Finally, Table III shows the classification results against different noise levels and presents the state-of-the-art comparison. Our approach achieves competitive classification results. In fact, in most of the cases, we attain the highest accuracy with the lowest number of features required, even for challenging noise conditions. It is worth mentioning that we include a straightforward classifier in comparison to those employed by the state-of-the-art.

TABLE III.
Method comparison results.

PQ event classification regarding the number of classes, number of features extracted, and level of noise.

Autor

V. Conclusions and Future Work

In this paper, we introduce a machine learning approach to support PQ event identification. We compute time and frequency-based patterns to reveal discriminate disturbance patterns. Besides, a Relieff-based approach is utilized to find a subset of features coding the most relevant properties on input data. Next, well-known classifiers are applied to estimate the disturbance label. Obtained results demonstrate that only computing the most four relevant features is enough to achieve a 98% accuracy. Moreover, our approach is competitive against state-of-the-art method concerning the required number of features and the classifier complexity (we achieve acceptable performances using a straightforward quadratic classifier).

As future work, authors plan to test the proposal introduced on signals that are perturbed with more than one disturbance simultaneously.

REFERENCES

[1]. S. S. Surajit Chattopadhyay, Madhuchhanda Mitra, Electric Power Quality, 1st ed., ser. Power Systems. Springer, 2011.

[2]. M. T. Alexander Kusko, Power Quality in Electrical Systems, 1st ed. McGraw- Hill Professional, 2007.

[3]. IEEE recommended practice for monitoring electric power quality, IEEE Std. 1159–2009, 2009.

[4]. A. De Almeida, L. Moreira, and J. Delgado, “Power quality problems and new solutions,” ISR--Department Electr. Comput. Eng. Univ. Coimbra, Polo II, vol. 1, no. 1, pp. 3030–3290, 2003.

[5]. Sharmistha Bhattacharyya and Sjef Cobben (2011). Consequences of Poor Power Quality – An Overview, Power Quality, Mr Andreas Eberhard (Ed.), ISBN: 978-953-307-180-0, InTech, Available from: http://www.intechopen.com/books/power-quality/consequences-of-poor-power-quality-an-overview

[6]. S. Elphick, P. Ciufo, V. Smith, and S. Perera, “Summary of the economic impacts of power quality on consumers,” 2015 Australas. Univ. Power Eng. Conf. Challenges Futur. Grids, AUPEC 2015, pp. 1–6, 2015.

[7]. O. P. Mahela, A. G. Shaik, and N. Gupta, “A critical review of detection and classification of power quality events,” Renew. Sustain. Energy Rev., vol. 41, pp. 495–505, 2015.

[8]. López-Lopera, A. F. “Selección de la mejor base para la caracterización de perturbaciones en señales de calidad de potencia usando transformaciones tiempo/frecuencia”, tesis de pregrado, Universidad Tecnológica de Pereira, 2013.

[9]. Uyar, Murat; Yildirim, Selcuk; Gencoglu, Muhsin Tunay. An effective wavelet- based feature extraction method for classification of power quality disturbance signals. Electric Power Systems Research, 2008, vol. 78, no 10, p. 1747-1755.

[10]. Saxena, D.; Singh, S. N.; Verma, K. S. Analysis of composite power quality events using s-transform. En Innovative Smart Grid Technologies-Asia (ISGT Asia), 2012 IEEE. IEEE, 2012. p. 1-7.

[11]. Ribeiro, Moisés Vidal; Pereira, José Luiz Rezende. Classification of single and multiple disturbances in electric signals. EURASIP Journal on Advances in Signal Processing, 2007, vol. 2007, no 2, p. 15-15. MLA.

[12]. Khokhar, S., et al. MATLAB/Simulink based modeling and simulation of power quality disturbances. En Energy Conversion (CENCON), 2014 IEEE Conference on. IEEE, 2014. p. 445-450.

[13]. Kumar, Raj, et al. Recognition of power-quality disturbances using S-transform- based ANN classifier and rule-based decision tree. IEEE Transactions on Industry Applications, 2015, vol. 51, no 2, p. 1249-1258.

[14]. Abdoos, A.A., Mianaei, P.K., Ghadikolaei, M.R.: Combined vmd-svm based feature selection method for classification of power quality events. Applied Soft Computing 38, 637-646 (2016).

[15]. He, H., Starzyk, J.A.: A self-organizing learning array system for power quality classification based on wavelet transform. IEEE Transactions on Power Delivery 21(1), 286-295 (2006).

[16]. Li, J., Teng, Z., Tang, Q., Song, J.: Detection and classification of power quality disturbances using double resolution s-transform and dag-svms. IEEE Transactions on Instrumentation and Measurement 65(10), 2302-2312 (2016).

[17]. Masoum, M., Jamali, S., Ghaffarzadeh, N.: Detection and classification of power quality disturbances using discrete wavelet transform and wavelet networks. IET Science, Measurement & Technology 4(4), 193-205 (2010).

[18]. Shukla, S., Mishra, S., Singh, B.: Power quality event classification under noisy conditions using emd-based de-noising techniques. IEEE Transactions on industrial informatics 10(2), 1044-1054 (2014).

[19]. Singh, U., Singh, S.N.: Application of fractional Fourier transform for classification of power quality disturbances. IET Science, measurement & Technology 11(1), 67-76 (2017).

[20]. L. Liang, F. Liu, M. Li, K. He, G. Xu, Feature selection for machine fault diagnosis using clustering of non-negation matrix factorization, Measurement 94 (2016) 295–305.

[21]. I. Kononenko, Estimating attributes: analysis and extensions of relief, in: European conference on machine learning, Springer, 1994, pp. 171–182.

[22]. M. Robnik-Sikonja, I. Kononenko, Theoretical and empirical analysis of relieff and rrelieff, Machine learning 53 (1-2) (2003) 23–69.

[23]. G. Devadasu and M. Sushama, "A novel multiple fault identification with fast fourier transform analysis," 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS), Pudukkottai, 2016, pp. 1-5.

[24]. Huang Weili and Du Wei, "Application of dynamic time-frequency analysis for power quality event classification and recognition," 2009 Chinese Control and Decision Conference, Guilin, 2009, pp. 349-352.

[25]. A. R. Abdullah, A. Z. Sha'ameri, A. R. M. Sidek and M. R. Shaari, "Detection and Classification of Power Quality Disturbances Using Time-Frequency Analysis Technique," 2007 5th Student Conference on Research and Development, Selangor, Malaysia, 2007, pp. 1-6.

[26]. R. Kumar, B. Singh and D. T. Shahani, "Symmetrical Components-Based Modified Technique for Power-Quality Disturbances Detection and Classification," in IEEE Transactions on Industry Applications, vol. 52, no. 4, pp. 3443-3450, July-Aug. 2016.

Author notes

Andrés Felipe Valencia-Duque

Received his undergraduate degree in electrical engineering (2018) from the Universidad Tecnológica de Pereira. Currently, his M.Sc.(c) in engineering from the same university. Research interests: machine learning and deep learning.

Andres Marino Alvarez-Meza

Received his undergraduate degree in electronic engineering (2009), his M.Sc. (2011), and his Ph.D. in automatics from the Universidad Nacional de Colombia. Currently, he is a Professor in the Department of Electrical, Electronic and Computation Engineering at the same university. Research interests: machine learning and signal processing.

Álvaro Orozco-Gutierrez

Received his undergraduate degree in electrical engineering (1985) and his M.Sc. degree in engineering (2004) from the Universidad Tecnológica de Pereira, and his Ph.D. in bioengineering (2009) from the Universidad Politécnica de Valencia (Spain). He received his undergraduate degree in law (1996) from Universidad Libre de Colombia. Currently, he is a Professor in the Department of Electrical Engineering at the Universidad Tecnológica de Pereira. Research interests: machine learning and bioengineering.

Reference	#Classes	#Feat.	Noise	Methodology	Accuracy
[15] 2006	6	11	-	WT + SOLAR(NN)	94.93
[17] 2010	16	8	-	WT + WN	98.18
[18] 2014	9	9	30 dB	EMD + HT + FPA	91.67
[14] 2016	9	17	20 dB	(ST + VMD) + SVM	98.11
[16] 2016	9	9	20 dB	DRST + DAG-SVM	97.8
[19] 2017	15	9	30 dB	FRFT	98.6
Ours 2019	10	4	-	T+F Relieff+Quadratic	98.95
		12	-		99.43
		15	10 dB		86.6
		17	20 dB		96.8
		17	30 dB		99.15
		12	40 dB		99.39