Burg Spectral Estimation

or the Maximum Entropy Spectrum

by K. T. Kilty

 

 

The algorithm I have included in the accompanying file is a relatively straight-forward 'C' translation of a fortran routine in Geophysical Data Analysis, by John Claerbout, McGraw-Hill,1976. I made a few alterations to conserve storage space, and moved to zero-based indexing. I have compiled the program using the LARGE memory model of Borland Turbo C 2.0, 3.0. The source code is included for anyone who wishes to make alterations or use this code within another project. Because this program runs under DOS and doesn't use expanded memory, there are limitations on the length of time series that you may analyze with the EXE file that is included: in effect, total length cannot exceed 8192 data points. This is almost a year-long series at one point per hour so it isn't too strict a limitation.

 

Claerbout's explanations concerning algorithms were never clear to me, so I will attempt an explanation of my own concerning how this program works.

 

 

The Z transform

 

In order to understand how this routine works, I will need to make some preliminary remarks about how geophysicists and engineers look at signals. A signal is a series of observations made in time (a time series) or a series of measurements made down a well or along the ground surface (a profile). In either case the signal is just a ordered list of numbers (x0, x1, x2, …). Another way to write the signal is to write it as a polynomial using the Z operator, which is also known as the unit delay operator. In this form the signal is X(Z)=x0+x1Z1+x2Z2+ …. The way to think of this polynomial is that coefficient x2 pertains to the delay 2 units in the future, and so forth. When a geophysicist examines the series or the coefficients of the polynomial, then he or she is working in the time (space) domain. On the other hand, if the geophysicist evaluates the polynomial at equal intervals on the unit circle (where the magnitude of Z is 1) in the Z (complex) plane, then he or she is working with the Fourier transform in the frequency (wavenumber) domain. I sometimes call this Fourier domain the reciprocal space.

 

Frequency resolution

 

Ordinarily a person uses an fFt (fast Fourier transform) to calculate spectra of time series. The fFt will take a real series of values, and return a complex series -- a spectrum with real and imaginary parts. From these one may calculate amplitude spectra, energy spectra, phase spectra, or whatever. The fFt works on a series of evenly spaced observations, and the finest resolution that is available in the spectrum is 1 cycle per length of time series. For example, a time series of 8192 points spaced one-hour apart has a resolution of 1/8192 cycles per hour. The typical situation, especially in geophysics, is that the data result from some experiment, and it is extremely inconvenient to run the experiment over again. So, we either accept the resolution available, or, if circumstances permit we can extrapolate the data in some "reasonable" way to make a longer time series. Of course this sounds absolutely stupid. It sounds like we're making up data. However, suppose that the time series is a short segment, but a representative segment, of a process that is stationary. A stationary process doesn't vary statistically with time, so if we can presume to have a representative sample, we can extrapolate it in some way, consistent with its statistics, that improves resolution.

Claerbout glossed over this idea of extrapolating in some "reasonable" way. This is an issue that has to be settled satisfactorily in each case. If it were an easy thing to extrapolate data, then we could use an fFt on an extrapolated time series using fabricated data and improve resolution. Lots of fellows I was in graduate school with had the idea that extrapolation meant to merely lengthen a time series with zero's which is precisely just making up data. I also read in the generally excellent book Numerical Recipes in ‘C’ by Wm. Press et al, that when using an fFt a person should always pad a series with zeros to make it a power of 2 in length. Once again this is just making up data. Making up data is bound to cause trouble unless it is done with care.

 

Calculating a spectrum from a filter instead of fFt

 

Suppose X(Z) is a stationary time series. We input it to a special filter which transforms the series into a sequence of random values, or what people call white noise. The spectrum of the output is flat—has a value of 1 if you wish. In frequency space the input, output and filter response obey the relationship 1=X(Z)H(Z) where H is the frequency response of the filter. Thus, H(Z)=1/X(Z), and the filter response is the inverse of the spectrum of the input series. Sharp peaks are easier to represent through a polynomial in the denominator than through one in the numerator, which leads us to speculate that this method of spectral estimation should represent sharp peaks well. The problem now is how to build the linear filter.

This leads to a whole series of considerations. First, the Burg algorithm works not with the spectrum X(Z), but with the power spectrum which is R(Z)=X(Z)X(1/Z). If we use R(Z) to design the filter, then we must keep in mind that many possible spectra X(Z) can lead to the same R(Z), there must be some way of obtaining a unique and useful X(Z). Second, the useful filter is one that has coefficients which decline in magnitude rapidly so that we can truncate it and still maintain good accuracy. Any filter that doesn’t adhere to this requirement depends on input from the extremely remote past, or extremely remote space to work well, and we cannot allow that.

The most obvious approach is to use a prediction-error filter. A prediction filter uses a contiguous segment of the time series values to predict future values, and the error estimation compares the predicted values to observations, once the observations are available. If designed well the output of a prediction-error filter is a random series of numbers – its spectrum is white. Moreover, if the filter is built from a representative length of time series, then it produces the same random output on a longer series of data. Therefore the prediction-error filter automatically extrapolates the data to as long a series as we like. So it seems like the answer to the resolution problem.

 

 

The Burg Algorithm

 

The specifics of how Burg went about designing his method will be much clearer now that I have covered these fundamentals.

Burg builds a prediction-error filter from the power spectrum as follows. We are looking for a filter H(Z), which when operating on X(Z), produces a white result. In other words H(Z) is the inverse of X(Z). So H(Z) has to be a filter that actually has an inverse. Let H(Z) operate on the power spectrum, R(Z), H(Z)R(Z) = H(Z)X(Z)X(1/Z) = X(1/Z). Moreover, we plan to predict only a single value, that being the current series element, x0. X(1/Z) contains this element in combination with those farther into the future 1/Z, 1/Z2, and so forth. Thus, we care only about one element of the polynomial X(1/Z). The polynomial equation is then H(Z)R(Z)=x0. The solution of the filter, H(Z), becomes a matrix equation, a set of equations in other words, by equating powers of Z in the polynomials.

The matrix equation is:

 

 

(h0, h1, h2, …) |r0, r1, r2, …| = (x0, 0, 0, …)

|r-1, r0, r1, …|

|r-2, r-1, r0, …|

 

The matrix R is made from auto-covariance elements of the series. A series of real values produces a symmetric matrix. The obvious way to solve for the vector of filter coefficients is to invert the matrix R and multiply by it on the right. However, this approach is not assured of finding a filter that actually has an inverse.

Burg reasoned that Levinson recursion provides another way to solve this equation, and has the advantage of always producing a useable filter, with an inverse and components that decrease in magnitude. Moreover, it does the calculation more quickly than other possible methods. Levinson recursion begins with a filter of length 2, (1, h1) for example, and builds the remaining filter one element at a time. Burg's algorithm finds the first element by minimizing the sum

 

2) E=SUM(xt+h1xt+1)2+(xt-1+h1xt)2

 

with respect to variations in h1. This is an average of forward and backward prediction, and always leads to an absolute value for 'h1' that is less than 1. Levinson recursion builds a series of length three from this one of length 2 by solving the vector equation

 

(1, h1, h2)=(1, h1, 0)-c(0, h1, 1)

 

Burg therefore turns the problem of finding the next series element into a least squares estimation of c, by minimizing

 

E=SUM(xt+axt-1-c(axt-1+xt-2))2+(xt+axt-1-c(axt-1+xt-2))2

 

with respect to variations in c. The same sequence of steps carry the filter to length 4, then 5, and so forth.

 

 

Instructions

 

The program BURG takes five arguments to run. In order they are the input file name, output file name, number of series to read in the input file, total number of frequency components in the output spectrum, and length of prediction error filter. It is run at the command line, for example, with

 

C:>BURG tides.dat tides.spc 2000 2048 1024

 

The example data file tides.dat is taken from hourly observations of tidal gauge at Old Woman Bay, Alaska. There are actually more observations in the file than the 2000 we asked for in the command line. The total number of spectral components, 2048 in this case, is chosen to be large enough to resolve features of interest in the data. Finally the estimator length is a means of compromising between a smooth spectrum with little resolution, and a high resolution spectrum that has much background noise. The fact that we can ask for as many spectral components as we wish is what allows the Burg algorithm to increase resolution. The 2048 components in this example begin at zero frequency and end at the Nyquist rate, which depends on the data spacing. Thus, the frequency increment between spectral components is (Nyquist Rate)/(2048-1).

Taking the example of the Old Woman Bay data, the difference in frequency between the lunar semiduirnal tide and solar semidiurnal tide is one cycle per 12.42 hours and per 12.0 hours respectively--0.0028 cycles per hour. Suppose that we wish to have at least 6 spectral components, or 7 intervals, between them. So the frequency resolution we require is 0.0004 cycles per hour. The raw resolution of the original data is the (Nyquist rate)/(2000-1). The Nyquist rate is 1 cycle per 2 hours (2 samples per cycle), or 0.5. Therefore the raw data had resolution of 0.00025 cycles per hour, which was already sufficient to achieve our goal, and so the value of 2048 for total length of the spectrum is appropriate.

The length of the estimator, 128 in this case, provides enough latitude in the prediction-error filter to locate 127 sharp spectral peaks. These peaks may be subdued and extremely smooth, however. Specifying a longer estimator will cause peaks to rise in amplitude and become sharper at the expense of producing spurious peaks that result from noise. Another effect is that the frequency of each spectral peak will shift slightly with changing estimator length. I often start an analysis with a short estimator, and run several successive analyses increasing the estimator by a factor of two, or so. The progression of results provides me with a clear view of which peaks are real, which result from noise, and the optimum length of estimator.

The Burg algorithm is much slower to run than an fFt, and the analysis method of making many successive Burg estimates is even more time consuming. However, the collection of data is the most expensive and time consuming step of all, and there is no substitute for doing a careful job of analyzing it.