Articles
New modified Bat algorithm for blind speech enhancement in time domain
New modified Bat algorithm for blind speech enhancement in time domain
Journal of applied research and technology, vol. 21, no. 6, pp. 982-990, 2023
Universidad Nacional Autónoma de México, Instituto de Ciencias Aplicadas y Tecnología
Received: 28 February 2022
Accepted: 23 May 2022
Published: 31 December 2023
Abstract: We address the speech enhancement problem for dual convolutif mixed channel by viewing it in a blind separation source setting. One widely used technique to separate mixed signals is to apply adaptive filtering, the challenge is to identify an unknown finite impulse response. Traditionally we apply a gradient-based algorithm to adapt filter coefficients. However, such algorithms often suffer from premature convergence when using large filters and non-stationary inputs leading to the so-called local minimum problem, which affects the quality of enhanced signals significatively. One alternative to overcome this problem is to apply a population-based metaheuristic algorithms in which filter coefficients are adapted iteratively by minimizing a cost function. But even with this metaheuristic-based solution, local minimum problem at large filters still exist. To avoid local minima and improve the chance to reach the global solution. We propose in this paper, a novel algorithm called a modified Bat algorithm to render the search process efficiently by enhancing its capability of exploration and exploitation. Several experiments under different noise types are conducted using our proposed modified Bat algorithm in comparison with some of the popular state-of-the-art algorithms. The enhanced signals obtained by each algorithm at the separation process outputs show good behavior and superiority of our proposed algorithm. In terms of system misalignment, as well as a segmental signal-to-noise ratio.
Keywords: Speech enhancement, blind source separation, population-based metaheuristic algorithms, system misalignment, segmental signal-to-noise ratio.
1. Introduction
Adaptive noise cancellation (ANC) is an alternative approach used to improve the quality of corrupted speech signal by different noises (Loizou, 2013). Numerous techniques were suggested to enhance the speech signal using the gradient-based algorithm family (Widrow et al., 1975). The most used algorithms from this family are the least mean square (LMS) and normalized least mean square (NLMS) algorithms (Rogers, 1996). However, gradient-based algorithms suffer from the local minimum optimization problem and the global solution is seldom attained. To avoid local minimum solutions in the ANC, many modifications of the normalized least mean square were proposed such as variable step size NLMS (VSS-NLMS) (Bendoumia & Djendi, 2015), and the wavelet-domain NLMS algorithms (Djendi, 2018).
In order to overcome this problem, algorithms-based metaheuristic algorithms are advised due to their simple implementation. Furthermore, metaheuristics are well known for their ability to avoid premature convergence and lead to a lowest chance of falling in local minima (Mahbub et al., 2010). Various metaheuristic algorithms have been used to resolve the ANC problem using adaptive infinite impulse response filters (IIR). The authors Chang and Chen, (2010), Kunche (2016) suggested to use a Bat algorithm (BA), genetic algorithm (GA), particle swarm optimization and its variant version to be applied in ANC.
The aim of this paper is to propose a new efficient modified Bat algorithm which will be implemented in a blind speech enhancement structure (in this work, we only consider the convolutive mixture of signals (Djendi, 2010)) . Note that this paper is an extended version of our work published in Fisli et al. (2019). we extend the previous version by a new theoretical basis and some efficient modification which will increase its performance and therefore the possibility to apply it in other scenarios involving other types of noise. The remains of this manuscript are organized as follows, in the second section, we review the mixing process that produces mixed signals, then the forward blind source separation structure (FBSS) is presented, in Section 3 standard BA is reviewed. Then our modified version of BA is presented. Simulation results discussions are discussed in Section 4. Finally, the paper is concluded.
2. Problem formulation
2.1. Mixture model
In Figure 1, we show the scheme of the convolutive mixture model (two source signals recorded by two microphones), where denote the speech signal, represent the punctual noise (Djendi, 2010).
and is the mixing process output, these two outputs are given by:
where and represent direct channel paths, and represent the cross-coupling effects between the channels, all these impulse responses are a finite impulse response (FIR), however, the symbol denote the convolution operator.
A complete mixing process can be simplified by considering some assumption:
Original signals are a clean speech and a noise signal, i.e., , .
Direct channel paths are considered equivalent to the unit impulse response, i.e.
.
Moreover, we assume that input signals are statistically independent.
Note that simplified convolutive mixing is widely used because it is well approved in theory and practice.
Figure 2 shows the new simplified convolutive model where the two noisy signals at each channel can be written as:
2.2. Forward blind source separation structure
In Figure 2, we suppose that we have no prior knowledge about the two input signals , and the two cross-coupling impulse responses and . In this situation, we call the technique that estimates the original signals by using only the observation, the blind source separation (BSS). In this technique two structures are applied to retrieve the original signals (Bendoumia & Djendi, 2015).
Forward and backward structures are frequently used in BSS due to their efficiency involving speech enhancement for hearing aids, speech recognition and teleconferencing systems. In this work we used forward blind source separation structure (FBSS), (see Figure 3). Note that FBSS can be used only when all observed signal of the separation process is a simple linear combination of the input signals. Outputs available at the FBSS structure are:
inserting (1) and (2) into (3) and (4), respectively, we get:
to obtain the optimal solution of the FBSS, we assume that : = and = thus the output equation of the unmixed signals is given by:
from (9) and (10), we can get the two-input signal estimation at the output, and with spectral and temporal distortions. Consequently the use of post-filters at output may be necessary (Djendi et al., 2006).
In this work, we consider only the case when the two microphones are lightly spaced, which leads to a low distortion, therefore, and . To obtain the estimated source signals yield to obtain an optimal solution for the adaptive filter, and , which we can obtained by minimizing the following objective function:
where is the input frame length and is the channel index.
2.3. Framework for adaptive filtering in FBSS based on metaheuristic
In general, to solve optimization problems with a metaheuristic algorithm, one needs to evaluate the cost functions at each iteration using a set of input data. In FBSS problems, the mixed signals which represent the input signals of the online adaptive filter are not entirely available, therefore, the efficient way to proceed is to evaluate the cost function using the available frame of observed signal at each iteration. Moreover, we propose, in this paper, to use a manual voice activity detection (MVAD) system to control the adjustment of the adaptive filter, therefore the manual adaptation control, allow to evaluate the cost function only during the noise presence period in the case of the filter , whereas the filter is updated during the voice activity presence periods. The general scheme of the proposed dual adaptive filtering by FBSS and metaheuristic algorithm is illustrated in Figure 4.
3. Algorithms review
Bat algorithms and modifications made to improve its efficiency are presented in this section.
3.1. Bat algorithm (BA)
The Bat algorithm (BA) belongs population-based algorithm (Yang, 2010). The bat can hunt even in the whole darkness using the echo return; this characteristic allows bats to differentiate between obstacles and insects as shown in Figure 5. The mechanism of echolocation can be modeled using a set of mathematical equations that consists of a bat swarm representing a potential solution, each bats move according to its velocity and position in land space according to a frequency , variable wavelength and loudness to search for prey location. Bats fine-tune the emitted pulse frequencies and the pulse emission rate, using the distance between them and prey. Optimization process is then repeated until the maximum number of iterations is reached; the position and velocity are updated using the following relations: of iterations is
were
: frequency min and max: frequency of the bat,: velocity and position of the bat at time n,: a random vector distribution uniformly distributed, :global near best solution,however, a random walk is generated for each bat to improve the local search:
were
ε:random value in the range [-1, 1],Ain:loudness of the ith bat at time n.
Furthermore, the loudness and the rate of pulse emission are updated a every iteration . During the process the rate of pulse emission increases while the loudness decreases once a bat has found its prey, we use for simplicity and = 0, which means that a bat has just met prey, therefore, bat stop to emit sound temporarily:
were
, :constant value, loudness of the : bat at time n.3.2. Formulation of the proposed modified Bat algorithm
Standard Bat algorithm has become very popular for solving real-world problem effectively, except in cases of higher-dimensional problems where BA suffers from local minima problems, to overcome this handicap a modified Bat algorithm (MBA) is introduced to adapt the large adaptive filter. The decreasing nature of the acoustic filter requires to change the philosophy of generated the new solution by improving the local search. In our proposed MBA algorithm, we suggest updating loudness parameter at each iteration which mean that loudness became variable during the optimization process by following a negative exponential function, loudness is estimated by:
where
θ, μ: constants in the range of [0, 1],moreover, by examination of real acoustic impulse responses, one can easily see the large distance between the first and the last point of impulse responses. In standard BA all the point filters are processed in the same way which prevents better exploitation and consequently lead to a wicked final solution. Wherefore in the proposed algorithm, we introduce another step to improve the quality of the solution by manipulating the elements of the best global solution individually according to the following equation:
c
Begin
• Set problem dimension 𝑛𝑛 ,number of Bat’s, Maximum number of iterations 𝑚αxit
the search space 𝑅 , minimum and maximum
value of frequency 𝑓min and 𝑓mαx.
• Randomly generate positions 𝑋𝑖(𝑖=1,2,…𝑛𝑛) and velocity 𝑉𝑖(𝑖=1,2,…𝑛) of bat
• Define pulse frequency 𝑓𝑖
• Initialize pulse rates 𝑟𝑖 and the loudness 𝐴𝑖
• Evaluate the objective function for each bat then to find the best initial fitness and the best global solution 𝐺best
While (𝑡<𝑚axit)
• Generate new solutions by adjusting frequency (Equation 12)
• Update frequency, velocities (Equations 13 and 14)
If (rand > 𝑟𝑖) then
• Select a solution among the best solutions randomly.
• Generate a local solution around the selected best solution by a local random walk (Equation 15)
End if
• Evaluate the objective function for each bat then and update the best fitness and Gbest
If (𝑟and <𝐴𝑖 & f (𝑥𝑖)<f (Gbest)) then
• Accept the new solution Increases 𝑟𝑖 using (Equation 17) and decrease 𝐴𝑖 using the modified Equation (18)
End if
• Evaluate the objective function for each bat then and update the best fitness and Gbest
For (j=1:n)
• Generate a new solution by manipulation only the 𝑗𝑡ℎ element of the Gbest (equation 19)
• Evaluate the objective function for the new Gbest
• Accept the change in the Gbest unless it guarantees a lower fitness value , if not the change is ignored
End For
End while
End
Return Gbest as the solution
ω: random value in te [-1, 1]
this step is followed by evaluating the objective function, the new solution is accepted unless it guarantees a lower fitness value compared the one obtained by initial , if not the change is ignored.
The modified Bat algorithm is expressed by the following pseudo-code.
4. Analysis of experimental results
In this section, we demonstrate the noise reduction capabilities of the proposed modified Bat algorithm in the context of speech enhancement. We perform extensive experiments under several different noisy observation and compare its performance to well-known metaheuristic algorithm including its original version Bat algorithm (BA), (Yang, 2010) particle swarm optimization (PSO) (Clerc, 2010) and gray wolf optimizer (GWO) (Okwu & Tartibu, 2021). We have used the simplified convolutive mixture model presented in Section 2. The clean speech signal is a sentence pronounced by one male speaker that is sampled at 8 kHz. We mixed clean speech using three different reel noise : white Gaussian, car, and USASI noises. The two impulse responses and are produced by random sequences, with exponentially negative functions (Djendi, 2010; Djendi et al., 2006). In Figure 6, we show a sample of the impulse response with length L=128, used to produce the mixing signals and where the input signals are a speech and USASI noise; the input SNRs at both sensors are (see Figure 7).
It should be mentioned that we have used all the instances described in Section 3 for all test, furthermore, the same population number, search space range and iteration numbers are used for all algorithms with the goal to evaluate the algorithm and then to get the better performance algorithm using the same setting. Moreover, results are conducted using three lengths of the adaptive filter L= 32, 64 and 128 and different input SNRs. Finally, all obtained results are averaged over 20 trial runs. Note that there are many manners to conduct the comparison of algorithm performances, in this work we propose to use two performance measures:
- System misalignment (SM) criterion that is defined as follows:
where represent the Euclidian norm operator, and denote the real filter vector and the adaptive filter vector, respectively.
-Segmental signal-to-noise ratio (SegSNR) which is given by the following relation:
where represents the absolute operator, and are the original and the estimated speech signals respectively, represents the number of samples needed to obtain the average value of the output SNR. In all experiments we have used a manual voice activity detector (MVAD), which means that we update the filter only in silence periods, whereas is updated only in speech-periods (Djendi, 2010). We should mention that the noisy observations and are processed segment by segment with overlap technique where each segment involves 256 samples, segmentation is performed using Hamming window with 25% overlap between adjacent frames (Kunche, 2016).
4.1. System misalignment (SM) evaluation
The experimental results in terms of SM criterion obtained by the four algorithms, i.e., BA (Bat algorithm), PSO (particle swarm optimization), GWO (gray wolf optimizer), and the proposed MBA algorithm are described in Figure 8 (we used the absolute value of each value to better illustrate the results). The parameters used to compute the output of each algorithm are summarized in Table 1. The adaptive filter length is variable, i.e., L=32, 64, and 128. The input SNRs are selected to be equal to -6 dB, 0 dB, and 6 dB. The punctual noise is white, USASI (United State of America Standard Institute, now ANSI), and a car noise. Note that we are only interested in the filter since the speech signal is obtained from the first channel. To begin with, we observe that the proposed MBA ideal performs significantly better than the other algorithm in all scenarios, whereas it is slightly inferior to the PSO the white noise scenario with a small filter (L=32).
In addition, the goal to investigate the potential of the MBA in terms of convergence speed in the transient regime, we have reported on the Figure 9 the temporal evolution of the SM criterion in the case of large adaptive filters (), the clean signal is mixed with white, USASI and car noise with different input SNRs, i.e., , and respectively. We can easily see that our proposed MBA needs lower time to converge in all scenarios, this means that the proposed MBA converges fast to the optimal solution in comparison with the other ones, i.e., BA, PSO and GWO algorithms. In other words , the proposed MBA has the lower steady state values in terms of SM and also the faster convergence speed performance which is a very important characteristic of any adaptive algorithm .
4.2. Segmental signal-to-noise ratio (SegSnr) criterion evaluation
A comparison of final values of the SegSnr criterion estimated on the denoised signals obtained by each algorithm are shown in Figure 10. The simulation setting parameters of each algorithm are the same as those given in Table 1. The results indicate that the proposed MBA performs much better than the BA, PSO and GWO algorithms in all scenarios. We also reported in Figure 11, the temporal evolution of the SegSnr criterion obtained at the first output using an adaptive filter with length L=128. Experiments are conducted using white, USASI and car noise with different input SNRs, i.e., , and respectively.
The results of Figure 11, confirm the superiority of the proposed MBA algorithm over the other ones, i.e., BA, PSO, and GWO in terms of convergence speed in transient regime as well as permanent regime in all experiments .
5. Conclusion
In this work, we have focused on the dual channel speech enhancement through adaptive filtering, we have suggested to use metaheuristic algorithms to adapt filter coefficients, also we have developed a new algorithm namely modified Bat algorithm. The proposed MBA algorithm is combined with the FBSS structure to reduce the acoustic noise components in noisy observations.
Experimental results indicate that the proposed algorithms outperform conventional and state-of-the-art metaheuristic algorithms (PSO, BA, and GWO), in terms of both convergence rate and segmental to noise ratio, as well as the steady state misalignment. In conclusion the obtained results, led us to conclude that the proposed algorithms could represent appealing solutions for speech enhancement and acoustic noise reduction applications.
Acknowledgements
This work was supported by the following grants: Laboratoire d’Automatique et Informatique de Guelma (LAIG), Guelma, Algeria and Signal Processing and Image Laboratory (LATSI), Blida, Algeria.
References
Bendoumia, R., & Djendi, M. (2015). Two-channel variable-step-size forward-and-backward adaptive algorithms for acoustic noise reduction and speech enhancement.Signal processing,108, 226-244. https://doi.org/10.1016/j.sigpro.2014.08.035
Chang, C. Y., & Chen, D. R. (2010). Active noise cancellation without secondary path identification by using an adaptive genetic algorithm.IEEE transactions on Instrumentation and Measurement,59(9), 2315-2327. https://doi.org/10.1109/TIM.2009.2036410
Clerc, M. (2010). Particle Swarm Optimization. Particle Swarm Optimization, 1942-1948. https://doi.org/10.1002/9780470612163
Djendi, M. (2010). Advanced techniques for two-microphone noise reduction in mobile communications (Ph. D. dissertation). University of Rennes, France (in French). https://www.theses.fr/2010REN1S012
Djendi, M. (2018). A new efficient wavelet-based adaptive algorithm for automatic speech quality enhancement. InProceedings of the Fourth International Conference on Engineering & MIS 2018(pp. 1-6). https://doi.org/10.1145/3234698.3234752
Djendi, M., Gilloire, A., & Scalart, P. (2006). Noise cancellation using two closely spaced microphones: Experimental study with a specific model and two adaptive algorithms. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing- Proceedings, 3, III-III. https://doi.org/10.1109/icassp.2006.1660761
Fisli, S., Djendi, M., & Guessoum, A. (2019). A New Dual Modifted Bat Algorithm for Design of Adaptive Noise Canceller. 2019 International Conference on Advanced Electrical Engineering, ICAEE2019, 1-6. https://doi.org/10.1109/ICAEE47123.2019.9014816
Loizou, P. C. (2013). Speech Enhancement: Theory and Practice. CRC Press, Inc. https://doi.org/10.1201/b14529
Mahbub, U., Shahnaz, C., & Fattah, S. A. (2010). An adaptive noise cancellation scheme using particle swarm optimization algorithm. In2010 International Conference on Communication Control and Computing Technologies(pp. 683-686). IEEE. https://doi.org/10.1109/ICCCCT.2010.5670753
Mahbub, U., Shahnaz, C., & Fattah, S. A. (2010). An adaptive noise cancellation scheme using particle swarm optimization algorithm,International Conference On Communication Control And Computing Technologies, Nagercoil, pp. 683-686, https://doi.org/10.1109/ICCCCT.2010.5670753
Okwu, M. O., & Tartibu, L. K. (2021). Grey Wolf Optimizer. Studies in Computational Intelligence, 927, 43-52. https://doi.org/10.1007/978-3-030-61111-8_5
Kunche, P., & Reddy, K. V. V. S. (2016).Metaheuristic applications to speech enhancement(pp. 7-16). Springer International Publishing. https://doi.org/10.1007/978-3-319-31683-3
Rogers, S. (1996). Adaptive filter theory. In Control Engineering Practice (Vol. 4, Issue 11). Pearson Education India. https://doi.org/10.1016/0967-0661(96)82838-3
Widrow, B., Glover, J. R., McCool, J. M., Kaunitz, J., Williams, C. S., Hearn, R. H., ... & Goodlin, R. C. (1975). Adaptive noise cancelling: Principles and applications.Proceedings of the EEE,63(12), 1692-1716. https://doi.org/10.1109/PROC.1975.10036
Yang, X. S. (2010). A new metaheuristic bat-inspired algorithm. InNature inspired cooperative strategies for optimization(NICSO 2010)(pp. 65-74). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-12538-6_6
Notes
The authors received no specific funding for this work.
Author notes
*Corresponding author. E-mail address: s.fisli@yahoofr (Sofiane. Fisli).
Conflict of interest declaration
The authors declare that they have no conflict of interest to declare.