Research

Home | Demos | Research | Database and Tools | Publications | People | Links

Active Noise Cancellation

Active noise cancellation is an approach based the principle of destrcutive inteference to perform the physical cancellation and control of sound and vibration. Adjoint LMS (see paper below) is a computationally efficient method for multi-input-multi-output systems. Patent no. 5,978,489.

Adjoint LMS: An Efficient Alternative to the Filtered-X LMS and Multiple Error LMS Algorithms

Eric A. Wan
Appears in ICASSP96, vol III, pp. 1842-1845

Introduction

The Filtered-x LMS algorithm is currently the most popular method for adapting a filter when there exists a transfer function in the error path. Such instances arise, for example, in active control of sound and vibration. For multiple-input-multiple-output systems the Multiple Error LMS Algorithm is a generalization of Filtered-x LMS. The derivation of both algorithms rely on several assumptions, including linearity of the adaptive filter and error channel. Furthermore, in the Multiple Error LMS Algorithm the desirable order $N$ computational complexity of LMS is lost, resulting in a prohibitive cost in certain DSP implementations.

In this paper, we describe a new algorithm termed adjoint LMS which provides a simple alternative to the previously mentioned algorithms. In adjoint LMS, the error (rather than the input) is filtered through an adjoint filter of the error channel. Performance regarding convergence and misadjustment are equivalent. However, linearity is not assumed in the derivation of the algorithm. Furthermore, equations for single-input-single-output (SISO) and multiple-input-multiple-output (MIMO) are identical and both remain order $N$.

Algorithm Specifications

An adaptive filter is specified as

$\displaystyle y(k) = \sum_{n=0}^{M1} w_n(k)x(k-n) = {\bf w}^{T}(k){\bf x}(k)$ (1) 

where $k$ is the time index, $y$ is the filter output, $x$ the filter input, and $w_n$ the filter coefficients. The vectors ${\bf w}(k) =[w_0(k), w_1(k), \cdots w_{M1}(k)]^{T}$ and ${\bf x}(k) = [x(k), x(k-1),\cdots x(k-M1)]^{T}$ provide for compact notation. It is also often convenient to write the filter operation by $y(k) = W(q^{-1},k)x(k)$ where $W(q^{-1},k) = \sum_{n=0}^{M1}w_n(k)q^{-1}$ with $q^{-n}$ representing a time delay operator (i.e., $q^{-n}x(k) = x(k-n)$).

\begin{figure}\vspace{-.2in}\epsfxsize = 3.2in\center{\leavevmode\epsfbox {xlms.eps}}\end{figure}
Figure 1: (a) Filtered-x LMS, (b) Adjoint LMS

The standard filtered x-LMS is illustrated in Figure 1a where there exists a physical channel represented by $C(q^{-1},k)$ between the output of the filter and the available desired response. The output error is defined as

$\displaystyle e(k) = d(k) - C(q^{-1},k)y(k)$ (2) 

and the filtered x-LMS algorithm expressed as
$\displaystyle {\bf w}(k+1)$ $\textstyle =$ $\displaystyle {\bf w}(k) + \mu e(k)\tilde{\bf {x}}(k)$ (3) 
$\displaystyle \tilde{x}(k)$ $\textstyle =$ $\displaystyle \hat{C}(q^{-1},k)x(k)$ (4) 

where $\tilde{x}$ corresponds to the inputs filtered through a model $\hat{C}$ of the error channel ($\mu $ controls the learning rate). This algorithm can be derived from the standard LMS algorithm assuming linearity by simply commuting the order of the filter and the channel. Thus the original $x$ input become filtered by the channel (channel model) before entering the filter and the error appears directly at the output of the adaptive filter. Properties of this algorithm are discussed in [6].

We now present an alternative algorithm called adjoint LMS. The equations are

$\displaystyle {\bf w}(k+1)$ $\textstyle =$ $\displaystyle {\bf w}(k) + \mu \tilde{e}(k-M2){\bf x}(k-M2)$ (5) 
$\displaystyle \tilde{e}(k)$ $\textstyle =$ $\displaystyle \hat{C}(q^{+1},k)e(k)$ (6) 

These equations differ from Equations 3 and 4 in that the error rather than the input is now filtered by the channel model as illustrated in Figure 1b ($M2$ is the order of the FIR channel model). Furthermore, the filtering is through the adjoint channel model ($q^{-1}$ is replaced with $q^{+1}$). Graphically, an adjoint system is found for any filter realization by reversing the flow direction and swapping branching points with summing junctions and unit delays with unit advances. This is illustrated in Figure 2 for a FIR tapped delay line. However, the method applies to all filter realizations including IIR and lattice structures. The consequence of the noncausal adjoint filter is that a delay (equal to the channel model delay) must be incorporated into the weight update in Equation 5 to implement an on-line adaptation.

\begin{figure}\vspace{-.15in}\epsfxsize = 3.2in\center{\leavevmode\epsfbox {adjoint.eps}}\end{figure}
Figure 2: FIR model of the channel and corresponding adjoint model. The adjoint system is found by reversing flow direction, swapping summing junctions with branching points and delays with advances.

Adjoint LMS is clearly a simple modification of filtered-x LMS. For SISO systems the computational complexity of adjoint LMS and filtered x-LMS are identical. The real advantage comes when dealing with MIMO systems. In this case the adaptive filters are represented by an $L\times P$ matrix of transfer functions ${\bf W}(q^{-1},k)$ and the channel by a $P \times Q$ transfer function matrix ${\bf C}(q^{-1},k)$. Filtered x-LMS does not generalize directly since matrices do not commute and it makes no sense to filter the input $X$ by ${\bf C}$ since dimensions may not even match. The Multiple Error LMS algorithm, proposed by Elliott et. al. [1,2], solves this by effectively applying filtered x-LMS to all possible SISO paths in the MIMO system, and can be written as:

$\displaystyle {\bf w}_{lp}(k+1)$ $\textstyle =$ $\displaystyle {\bf w}_{lp}(k) + \mu {\bf e}^{T}(k)\tilde{\cal{\bf X}}_{lp}(k)$ (7) 

for $1<l<L$ and $1<p<P$, and there is now a filtered matrix of inputs for each filter ${{\bf w}_{lp}}$ formed as:
$\displaystyle \tilde{\cal{\bf X}}_{lp}^{T}(k) = \left[\begin{array}{cccc}\til......) & \tilde{\bf x}_{lp2}(k) & \cdots & \tilde{\bf x}_{lpQ}(k)\end{array}\right]$ (8) 

with each row in the matrix found by filtering the input through the corresponding secondary path:
$\displaystyle \tilde{x}_{lpq}(k) = C_{pq}(q^{-1},k)x_{l}(k).$ (9) 

The implementation of Multiple Error LMS results in a total of $L\times P \times Q$ filter operations. In the cases of adjoint LMS, however, we encounter no such problem. Equations generalize directly:

$\displaystyle {\bf w}_{lp}(k+1)$ $\textstyle =$ $\displaystyle {\bf w}_{lp}(k) + \mu \tilde{e}_p(k-M2){\bf x}_l(k-M2)$  
$\displaystyle \tilde{{\bf e}}(k)$ $\textstyle =$ $\displaystyle \hat{{\bf C}}(q^{+1},k){\bf e}(k),$ (10) 

Here we note that the output error ${\bf e}$ is dimension $Q$ (number of channel outputs) whereas the error $\tilde{{\bf e}}$ after filtering through the adjoint MIMO channel model is order $P$ (number of primary filter outputs) as desired. The clear advantage of this form is that operations remain order $N$, where $N$ is the total number of filter parameters (compare the weight update matrix operation in Equation 7 to the vector operation in Equation 10). Table 1 gives a comparison of multiplications for some specific parameter values.

Table 1: Comparison of computational complexity. Reference inputs, $L=8$. Adaptive filter outputs, $P=4$, Adaptive filter taps, $M1=6$, Channel outputs, $Q=8$, Channel model taps, $M2=6$.
Multiplications Adjoint LMS
$\tilde{{\bf e}}(k)$ $P \times Q \times M2 = 192$
weight updates $L \times P \times M1 \times 2 = 384 $
total $567$
Multiplications Multiple Error LMS
filtered inputs $L \times P \times Q \times M2 = 1,536$
weight updates $L \times P \times M1 \times (Q +1) = 1,728 $
total $3,264$

\begin{figure}\epsfxsize = 3.2in\center{\leavevmode\epsfbox {lcurves.eps}}\end{figure}
Figure 3: A comparison of instantaneous squared error learning curves. Channel: $C(q^{-1}) = q^{-2}(1 -2q^{-2})$ (a delayed bandpass). Channel Model: $\hat{C}(q^{-1}) = q^{-2}(1 -2.5q^{-2})$. Input: $x(k) = $ white noise Normal(0,1). Adaptive Filter: ${\bf w}$, FIR with 6 taps. Desired Response: $d(k) = W^{*}(q^{-1})C(q^{-1})x(k) + n(k)$, where $W^* =\frac{1 + .5q^{-1}}{1 + .75q^{-1}}$ is the optimal primary filter and $n(k)$ is white noise Normal(0,0.2). Learning rate: $\mu = 0.01$.

Interpretations

In gradient descent algorithms, weights are moved in the direction of the negative gradient $\frac{\partial J}{\partial {\bf w}}$, where $J$ is a cost function. For filtering, $J$ is typically a sum of a sequence of errors $e^2(k)$ and the error gradient can be expanded as a sum of instantaneous error gradients:

\begin{displaymath}\frac{\partial J}{\partial {\bf w}} =\sum_{k=1}^{K}\frac{\partial e^2(k)}{\partial {\bf w}}.\end{displaymath}

Standard LMS, filtered x-LMS, and Multiple Error LMS are all based on iteratively updating the weights in the direction of the instantaneous negative gradient $\frac{\partial e^2(k)}{\partial {\bf w}}$.

While adjoint-LMS is still a stochastic gradient descent algorithm, it is not based on the instantaneous gradient. Instead, consider the chain rule expansion

\begin{displaymath}\frac{\partial J}{\partial {\bf w}} = \sum_{k=1}^{K}\frac{\pa......f w}} =\sum_{k=1}^{K}\frac{\partial e^2(k)}{\partial {\bf w}.}\end{displaymath}

Individual terms in the two sums are not equivalent, only the total sum over all time. Adjoint LMS stochastically updates filter weights based on this new expansion which leads to the more computationally efficient form. $\frac{\partial J}{\partial y(k)}$ is interpreted as the change in the total error over all time due to a change in the filter output at time $k$. This gradient term is precisely the filtered error term $\tilde{e}(k)$ in the equations for adjoint LMS ( $\frac{\partial J}{\partial y(k)} \frac{\partial y(k)}{\partial {\bf w}} = \tilde{e}(k) {\bf x}(k)$ in Equation 5). The derivation of this is detailed in [4,5] in the original context of neural networks. In fact, an additional advantage of the algorithm is that it can be generalized to when both the primary filter and channel are modeled with nonlinear filters. On the other hand, an approach based on Filtered x-LMS does not apply, since in general nonlinear systems do not commute for even the SISO case.

\begin{figure}\epsfxsize = 3.2in\center{\leavevmode\epsfbox {misadj2.eps}}\end{figure}
Figure 4: A comparison of misadjustment (excess MSE / min MSE) versus the learning parameter $\mu $.

Simulations and Conclusions

A SISO simulation illustrating the similar performance of adjoint LMS to filtered x-LMS is shown in Figure 3. Learning curves are remarkably alike even though the individual stochastic weight updates are not identical. Our conclusions are that adjoint-LMS has an equivalent rate of convergence and misadjustment to filtered x-LMS with a substantial computational savings over Multiple Error LMS. The only trade-off, in certain cases, is a slightly tighter restriction for stability on the learning parameter as might be expected due to the delayed weight update [3]. This causes a slight increase in misadjustment for large learning parameters as illustrated in Figure 4. Fortunately, this occurs beyond the desirable operating range for the algorithms.

A second simulation for a MIMO noise cancellation experiment is detailed in Figure 5 and Figure 6. Again the similar performance of the Adjoint LMS to Multiple Error LMS is observed. Note that in this case, Adjoint LMS exhibits the greater range for stability versus learning rate. While a full analysis is yet to be completed, experiments also indicate equivalent performance with regard to eigenvalue spread of the inputs and errors in channel modeling.

\begin{figure}\epsfxsize = 3.2in\center{\leavevmode\epsfbox {lcurvesMIMO.ps}}\end{figure}
Figure 5: A comparison of instantaneous squared error learning curves for a MIMO system with $L=2$ inputs, $P=2$ filter outputs, and $Q=3$ secondary channel outputs. Inputs $x_1(k) = cos(k\pi /5)$, and $x_2(k) = cos(k\pi /8)$. Primary channel ${\bf P}(q^{-1})$ from inputs to secondary channel output locations: $P_{1c}(q^{-1}) = (.9q^{-1})^{c-1}(1-.9*q^{-1})$, $P_{2c}(q^{-1}) = (.8q^{-1})^{c-1}(1-.9*q^{-1})$, for $c=1,2,3$. Desired output (disturbance) at channel output: ${\bf d}(k) = {\bfP}(q^{-1}){\bf x}(k) + {\bf n}(k)$, where $n_l(k)$ is white noise Normal(0,0.01). Secondary channel model: $\hat{C}_{pc}(q^{-1}) =q^{-p-c+1}$, $p=1,2$, $c=1,2,3$. Adaptive Filters ${\bf w}_{lp}$, all FIR order 6. Learning rate: $\mu = 0.01$. Such a system represent a simple propagation of harmonics which arrive at different sensors with increasing distance and attenuation. The secondary channel model corresponds to pure delays associated with different locations of the filter outputs.

\begin{figure}\epsfxsize = 3.2in\center{\leavevmode\epsfbox {mseMIMO.eps}}\end{figure}
Figure 6: A comparison of $MSE = E[e_1^2(k) +e_2^2(k) +e_3^2(k)]$ versus the learning parameter $\mu $. Note in this example, Adjoint LMS exhibits a greater range of stability versus the learning parameter.

Bibliography

1
S. Elliott, I. Stothers, P. Nelson. A Multiple Error LMS Algorithm and Its Application to the Active Control of Sound and Vibration. IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-35, No. 10, October 1987.

2
S. Elliott and P. Nelson. Active Noise Control. IEEE Signal Processing Magazine, October 1993.

3
G. Long, F. Ling, and J. Proakis. The LMS algorithm with delayed coefficient adaptation. IEEE Transactions on Acoustics, Speech, and Signal Processing., vol. 27, no. 9, pages 1397-1405, Sept. 1989.

4
E. Wan. Finite Impulse Response Neural Networks with Applications in Time Series Prediction. Ph.D. dissertation. Stanford University. 1993.

5
E. Wan, and F. Beaufays. Diagrammatic Derivation of Gradient Algorithms for Neural Networks. Neural Computation, Vol. 8, No. 2. 1995.

6
B. Widrow, and S. Stearns. Adaptive Signal Processing. Prentice-Hall, 1985.