Academic Open Internet Journal
Volume 8, 2002





Zaharov V.V., Fausto Casco S.

Universidad Autonoma Metropolitana - Iztapalapa,

Av. Michoacan y Purisima, Colonia Vicentina, C.P.09340,

México D.F., MEXICO,

Tel. (5255)58044637,

Abstract: Method of nonstationary stochastic process prediction using discrete wavelet transform is presented. As it is shown, the application of Discrete Wavelet Transform in maximum entropy prediction method allows decrease the square average prediction error in comparison with conventional methods that use algorithm of Discrete Fourier Transform approximately 3-4 times for presented test sequence. Its advantage was obtained due to information about process structure in the varied bands of frequency that received from coefficients of Discrete Wavelet Transform.

Keywords: discrete wavelet transform, in maximum entropy prediction method, multiresolution analysis.


Within recent years, increasingly Discrete Wavelet Transform (DWT) have been used in many real applications because it allows to present nonstationary process more adequately. The Fourier transform (FT) had wide application as a tool for data processing in many applications for many years. The bases of complex exponents used in Discrete Fourier Transform (DFT) is optimal as to the many criteria in case of stationary process since whereat the coefficients of decomposition are uncorrelated. In case of nonstationary process they lose localization in time domain. The analysis of such signals is possible with the DFT using sliding window through the data [8]. The disadvantage of such approach is the high computational complexity of the decomposition algorithm.

In DWT instead of harmonic orthogonal functions are used the systems of functions generated by shifts and the compression of some generating functions with compact support in frequency and time domains. That allows divide process on “small details” in frequency domain and in the same time to localize them in time domain. This "compact support" allows the wavelet transform (WT) to translate a time-domain function into a representation that is not only localized in frequency (like the FT) but in time as well.

Fundamentally, wavelets are a new type of function, which provide an excellent orthonormal basis for functions of one or more variables. They provide a localized basis, and can represent square-integrable functions, but also constant and, more generally, polynomial functions in a locally finite manner. In 1988 Daubechies' fundamental paper on wavelets appeared [6]. In this paper was fined for the first time a parameterised family of orthonormal systems of functions in with certain important complementary properties:

1. Each system is generated from a scaling function and a wavelet function by rescaling and translations, e.g., , and the wavelet system is an orthonormal basis for .

2. Each element in a given system has compact support in time and frequency domain. The supports of the basis functions becomes very small for large scaling index j.

3. There are fast algorithms for computing the coefficients of wavelet transform

4. The classical discrete Fourier and cosine transforms appear as a special case of the general discrete wavelet transform (DWT).

The aim of the proposed work is applying DWT to the prediction of nonstationary stochastic process of the various nature and comparative analysis of obtained prediction with traditional methods like the maximum entropy method [7,8] in which is used the spectral Fourier coefficients. In traditional methods was obtained the Fourier coefficients with data sliding window and without data sliding window.


The main problem of the wavelets analysis is creating of basis functions for wavelets systems [1,2,9]. The synthesis of bases function is founded on the theory of multiresolution analysis (MA) [10,11]. The sequence of closed subspaces of , is defined as a multiresolution analysis of if are satisfied the following properties:


2. ,

3. ,

4. If

5.Such function (scaling function) with a non-vanishing integral exists, that the collection of his shifts is a orthonormal basis of .

Since a sequence hk exists such that the scaling function satisfies the condition


The equation (1) is refinement equation (or two scale difference equation).

The Fourier transform of the scaling function is

where is a 2p -periodic function.

By integrating both sides of eq.(1), and dividing by the non-vanishing integral of j , we see that , and as a consequence .

The scaling function meets the normalization condition and for its collection [2].

The collection of functions


is the orthonormal basis of the space.

In many applications, we never need the scaling function itself. Instead we may often work directly with the hk. An orthogonal scaling function is a function j such that the set is an orthonormal basis and satisfies by the next property


With such j , the collection of functions is an orthonormal basis of and the collection of functions is an orthonormal basis of .

Using the Poisson summation formula [10], in the frequency domain we have

We consider the subspace as complementary space of in , . If the set consists on functions which spectrums are concentrated in interval , then consists on the function with spectrum in interval and , symbol Å is direct sum sign. In other words, each element of can be written as the sum . The space contains the detail information needed to go from an approximation at resolution j to an approximation at resolution j+1. Consequently, .

A function is a wavelet if the collection of his shifts is a orthonormal basis of and the collection of functions


is orthonormal basis of .

Each element in a given system has compact support in time and frequency. Since the wavelet a sequence gk exists such that

, (5)

where and , ¾ is the sign of conjugation.

The Fourier transform of the wavelet is given by


where is a 2p -periodic function and .

An orthogonal wavelet is a function y such that the collection of functions

is an orthonormal basis of . In this case

. (6)

In the frequency domain eq. (6) is .

The collection of functions is an orthonormal basis of . Since the spaces are mutually orthogonal, the collection of functions is an orthonormal basis of L2(R).

Than, a sufficient condition for a multiresolution analysis to be orthogonal is

. (7)

The same in the frequency domain

, (8)


The decomposition of any given function in space can be presented as


where .

Taking into account that and applying of eq.(9) we write



can be presented as ;

can be presented as


Similarly for .

Multiplicating (10) at and applying eq.(3) , (7) and lemma we get

½ ´

Multiplicating (10) at and applying eq.(6)we get

In general case and we have equation


From eq.(11) the coefficients and can be define as

; j=-1,-2,…,-m

Replacing y we get

; j=-1,-2,…,-m (12)

The eq. (12) is direct discrete fast wavelet transform that consists from 2 convolutions, and is initial coordinates of any function in the space, where

Applying the similar manner by multiplying (11) at we have

Therefore we have inverse fast wavelet transform


One of the powers of wavelet technology is the ability to choose the defining coefficients for a given wavelet system to be best adapted to the given problem. Daubechies developed in her original paper [6] specific families of wavelet systems which had maximal vanishing moments of the function and which were very good for representing polynomial behavior.


As follows from (12) and (13) the forward and inverse wavelet transform can be implemented using a set of upsamplers, downsamplers and recursive two-channel filter banks, that named also pyramidal algorithm because has pyramidal structure. At [1,5] has shown that the tree or pyramid algorithm can be applied to the wavelet transform by using the wavelet coefficients for calculate the filter coefficients of the QMF (Quadrature Mirror Filter) filter pairs. From relation (8) follows that for the filter coefficients have the following properties in frequency domain




The same wavelet coefficients are used in both the low-pass (LP) and high-pass (HP) (actually, band-pass) filters. The low-pass filter coefficients are associated with the scaling function. The output of each low-pass filter is , or approximation components, of the original signal for that level of the tree. The high-pass filter coefficients is associated with the wavelet function (note the alternating sign change) .

The output of each high-pass filter is the , or detail components, of the original signal. The of the previous level are used to generate the new and for the next level of the tree. Decimation by two corresponds to the multiresolution nature (the j parameter) of the scaling and wavelet functions (see Figure 1).


Figure 1. Direct pyramidal structure

The reverse fast wavelet transform essentially performs the operations associated with the forward fast wavelet transform in the opposite direction (Figure 2). The expansion coefficients are combined to reconstruct the original signal. The same , coefficients are used as in the forward transform, but in reversed order. The process works down the branches of the tree combining the approximation and detail signals into approximation signals with higher levels of detail.

Instead of decimation, the signals are interpolated: zero’s are placed between each approximation and detail sample, and the signals are then passed through the low-pass and high-pass filters. The intermediate 0 values are replaced by "estimates" derived from the convolutions. The filters' outputs are then summed to form the approximation coefficients for the next higher level of resolution. The final set of approximation coefficients at the tree's top level in the reverse transform is a reconstruction of the original signal data points.


Figure 2. Inverse pyramidal structure

As follows from eq.(12) and (13) the signal decomposition depends on the impulse responses of the low and high pass filters, that in turn depends on the choosing of scaling and wavelets function. The function and can be interpreted as coefficients of the series for the functions and and have the forms



In signal processing terminology, the various bases of the DWT can be used by choosing the corresponding coefficients of filter banks.


We consider applying DWT for prediction problem. As the basic method will be used the maximum entropy method [7,8] in which instead of spectral Fourier coefficients have been used the coefficients of DWT calculated according to eq.(12).

The first application of DWT for linear prediction system was proposed in [4]. The concept behind this system is to use the DWT to decompose the data into "subspaces" that are easier to model with multiple linear predictors. The paper [12] considers the DWT application for market stock rate analysis some of the nonstationary data.

We will use the classical method of prediction using AR- model for an estimation of a forward linear prediction of element xn, n =1,2...,N and xn = 0 at n < 1, n > N

, (16)

where is estimation of element of xn , ak is a coefficients of prediction filter, N is number of elements of a time series, p is order of AR - model.

If minimize an error of a linear prediction , then a vector of a linear prediction parameters ak which minimizes the average square of an error is the solution of the normal equations

, (17)

where is (p + 1) vector, ,, , ~ is symbol of transpose and conjugate, T is transpose symbol, is Toeplitz (p+1)´ (p+1) correlation matrix of input data, X is (N+p)´ (p+1) matrix of input data with structure as shown below


The eq.(17) is the Yule- Walker equation for autoregressive process and can be solved concerning to parameters ai, i=1,2..., p by various known algorithms, for example by Levinson algorithm [1]. In particularly if the inverse matrix has a form


then the solution of eq.(17) is


The solution (17) allows obtaining the spectral density of input data for the received parameters with maximum entropy [1]


were is estimation of parameter an, , D t is a sampling interval.

For prediction of nonstationary sequences more referable introduce the exponential window that is gone along the input data, creating the least change of the current errors and very strongly reducing of older ones. Thus the equation for r we put as

, (22)

were w is positive real scalar satisfying to a condition 0 < w £ 1. Then expression for a matrix R we can write

, (23)

where is n-th row of matrix X , n = 1,2..., N+p.

We will consider and then compare for criteria of minimum prediction errors 3 basic algorithm of prognostications

Algorithm1. Classical method of prediction without sliding window, that consists from the following steps:

Step 1. Obtaining of current input data element xn, n=1,2,…,N

Step 2. Forming matrix of input data X according of eq.(18)

Step 3. Calculation of a matrix and vector A using eq.(19) and (20).

Step 4. Estimation of a forward linear prediction of elements xn with eq. (16)

Algorithm2. Classical method of prediction with sliding window, that consists from the following steps:

Step 1. Obtaining of current input data element xn, n=1,2,…,N

Step 2. Forming and waiting of (N+p)´ (p + 1) input data matrix X


where , n = 1,2, …, N +p is number of the current element.


Step 3. Calculation of a matrix and vector A using eq.(19) and (20).

Step 4. Estimation of a forward linear prediction of elements xn with eq.(16).

Algorithm 3. Prediction algorithm that uses DWT.

Step 1. Obtaining of current input data element xn, n=1,2,…,N+p

Step 2. Calculating the DWT of the input data in the finite interval and defining the DWT coefficients.

; j=-1,-2,…,-m (26)

Step 3. Carrying out of forward linear prediction of wavelets coefficients for each level of scale.

, (27)

, (28)

where the coefficients of the prediction filter determined similar as in eq.(20).

Step 4. Obtaining the predicted data using inverse DWT with prediction coefficients.

, (29)

where is the initial prediction function.

The variety of orthonormal bases which can be formed by the Discrete Wavelet Transform algorithm, coupled with the infinite number of wavelet and scaling functions which can be created, yields a very flexible analysis tool. Not only can the best wavelet be chosen to analyse a particular signal but the best orthonormal basis as well. But in many applications as shows the practice the Daubechies wavelets are optimum at concentrating the signal power into the lower-scale transform coefficients. As a result, there is less "risk" in the prediction.

To compare the presented algorithm on prediction problem the simulation in PC was carry out. For the evolution of prediction mean least square error e2(n) are used the sequence with 10 single tone components with frequencies since 1 Hz to 25 Hz with interval 2-7Hz with random initial phases on interval 0-2p and white noise with power 3 time less then complete power of tone components. To this process was added the trend component than simulated as a lineal polynomial function.

The prediction was made in 20 forward samples with using 3 algorithm of spectral coefficients obtaining: DWT (1), DFT with windowing (2) and DFT without windowing (3). In the experiment was evaluated relation e2(n)/ x2(n) for each of 20 forward samples. The result was averaged for 200 statistics and is shown in Figure 3.

Figure3. Depending of prediction error from the number of forward samples

As follows from figure3, applying DWT for prediction reduce the e2(n)/ x2(n) relation approximately three times in comparison with DFT with sliding window and four times without sliding window when the number of prediction samples equal 20.


In this paper the fast discrete wavelet transform and its application to prediction of nonstationary stochastic process was presented. As it is shown, the application of DWT in maximum entropy prediction method is allows decrease the square average prediction error in comparison with methods that using DFT with sliding window and DFT without sliding window. In particularly, for the presented test sequence for the number of prediction samples equal 20 the gain is 3-4 times in comparison with DFT prediction methods. Its advantage was obtained due to information about process structure in the varied bands of frequency that received from coefficients of DWT.



  1. Akansu A.N., Smith M.J.T., Subband and Wavelet Transforms: Design and Applications, Kluwer, 1996.
  2. Benedetto J., Wavelets, Mathematics and Applications, Press New York, 1994; 574p.
  3. Cody, Mac A. "The Fast Wavelet Transform." Dr. Dobb's Journal, April, 1992, p.16 - 28.
  4. Cody, Mac A. “Wavelet Transforms as Applied to Financial Market Analysis”, Presented at the conference “Evolving Complexity Challenges to Society, Economy and the Individual”, The University of Texas at Dallas, November 18 & 19, 1994
  5. Coifman R.R. Meyer Y., Wiskerhauser V. Wavelet Analysis and Signal Processing. - Boston. - Jones & Barlet. - 1992.
  6. Daubechies I. Ten Lectures on Wavelets - SIAM,v.61,1992.
  7. Haykin S. Adaptive Filter Theory. - New Jersey. - Prentice Hall. - 1996.
  8. Marple S.L. Jr. ”Digital spectral analysis with application”, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1987
  9. Meyer Y. Wavelets: Algorithms and Applications. - Cambrige University Press. - 1993.
  10. Petukhov A.P. The introduction in theory of wavelets bases.- St.Peterburg,1999.
  11. Strang G., Nguyen T. “Wavelets and filter banks” Wellesley-Cambridge Press 1996, 490p.
  12. Zaharov V.V., Kohanov A.B. “Discrete Wavelet Transform Application for Market Stock Rate Analysis” Issue of International Scientific Conference “Methods & Algorithms Of Applied Mathematics in Engineering, Medicine And Economics”, Novocherkassk, Russia, Feb.2001, P.30-34.
  13. Zaharov V.V., Casco F.S. "Spectral estimation fast algorithm with longterm numerical stability", Electronic Journal "Investigated in Russia", #27å, 2001, p. 282-290,

Technical College - Bourgas,
All rights reserved, © March, 2000