# Kullback–Leibler divergence and the Pareto–Exponential approximation

- G. V. Weinberg
^{1}Email author

**Received: **15 January 2016

**Accepted: **29 April 2016

**Published: **12 May 2016

## Abstract

Recent radar research interests in the Pareto distribution as a model for X-band maritime surveillance radar clutter returns have resulted in analysis of the asymptotic behaviour of this clutter model. In particular, it is of interest to understand when the Pareto distribution is well approximated by an Exponential distribution. The justification for this is that under the latter clutter model assumption, simpler radar detection schemes can be applied. An information theory approach is introduced to investigate the Pareto–Exponential approximation. By analysing the Kullback–Leibler divergence between the two distributions it is possible to not only assess when the approximation is valid, but to determine, for a given Pareto model, the optimal Exponential approximation.

## Background

The Pareto distribution has become important in maritime surveillance radar signal processing, since it has been validated as an intensity model for X-band clutter returns. Beginning with the work of Balleri et al. (2007), the Pareto model was fitted to data obtained by the Canadian IPIX radar, while situated at a test site located on Lake Ontario, in Grimsby Canada, with the radar located at a height of 20 m. This radar operated at a frequency of 8.9–9.4 GHz, with a pulse repetition frequency of 1000 Hz and compressed pulse length of 0.06 μs, resulting in a range resolution of 9 m. It operated in horizontal transmit and receive (HH), vertical transmit and receive (VV) as well as in the cross polarisation case of vertical transmit and horizontal receive (VH). It was reported that the Pareto fit improved on that of the Weibull, K and Log-Normal in all cases examined, especially in the HH polarised case.

A second validation of the Pareto model for sea clutter is given in Farshchian and Posner (2010), who describe and analyse sea clutter returns obtained during a United States’ Naval Research Laboratory (NRL)-led trial in 1994, located in Kuai, Hawaii. The radar operated at a frequency of 9.5–10 GHz with a pulse repetition frequency of 2000 Hz, 2.5 μs compressed pulse length and 0.375 m range resolution. It operated in both HH and VV polarisations; however, the radar used was not dual polarised and so these were collected separately. The radar was at a height of 23 m above sea level, so that the grazing angle was 0.22° and the radar range was 5.74 km for VV and 6.11 km for HH-polarisation. The data analysed in Farshchian and Posner (2010) focused on the up wind direction, which is generally the most spiky. The wind speed was roughly 9 m/s and the largest wave height was roughly 3 m, so that the sea state was approximately 4. The results of the trial was conclusive evidence that at a low grazing angle, the Pareto model outperformed the Weibull, Log-Normal and K-Distributions. Additionally, the model was compared to mixtures of Weibull and K, and shown to outperform Weibull mixtures, while having comparable performance to a K-mixture model. Given the latter is a three to four parameter model, the performance of the two parameter Pareto model was determined to be excellent.

A third validation for the Pareto model has been provided by Defence Science and Technology Group (DSTG) in Australia, based upon data from their Ingara radar. Ingara is an experimental fully polarimetric airborne multi-mode X-band imaging radar developed by DSTG (Stacy and Burgess 1994), which was deployed in a Raytheon Beech 1900C aircraft during a number of trials. A trial was conducted in 2004, in the Southern Ocean near Port Lincoln in South Australia (Stacy et al. 2005). The radar operated with a frequency of 10.1 GHz, with a pulse length of 20 μs, pulse repetition frequency of 300 Hz and LFM transmitted bandwidth of 200 MHz. This permitted a range resolution of 0.75 m. Ingara operated in a circular spotlight mode, surveying the same patch of ocean at all azimuth angles (0°–360°), and over the range of grazing angles 10°–45°. Sea states varied from 2 to 5, while wind speeds varied from 6.1 to 13.2 m/s. The data gathered in this trial was analysed in blocks composed of 1024 range compressed samples of roughly 920 pulses over 5° azimuth angle increments. The Pareto fit to the Ingara clutter has been reported initially in Weinberg (2011a), then further analysed in Rosenberg and Bocquet (2013). The inclusion of receiver thermal noise in the Ingara data, together with a Pareto clutter model, has also been reported in Rosenberg and Bocquet (2015). The conclusions from these investigations was that the Pareto distribution also fitted medium to high grazing angle clutter, obtained from an airborne surveillance radar.

These three independent studies confirmed the validity of the Pareto model for X-band maritime surveillance radar clutter, regardless of the radar platform and independent of the grazing angle. Consequently much effort has been invested in the development of non-coherent detection under a Pareto clutter model assumption (Weinberg 2013a, 2015).

The Pareto distribution also fits into the currently accepted framework for clutter models in the complex domain, since it arises as the intensity model of a compound Gaussian distribution with inverse Gamma texture (Weinberg 2011b). As a result of this, coherent radar detection schemes have been analysed extensively, based upon this clutter model assumption (Sangston et al. 2012; Shang and Song 2011; Weinberg 2013b, c).

Although the Pareto model has presented radar researchers with a simpler alternative to the Weibull and K-distributions, there is still merit in applying the original detection schemes designed for target detection in Gaussian clutter, or in Exponentially distributed intensity clutter, since in some cases X-band clutter is reasonably approximated by these processes. The validity of such an approximation has been analysed in Weinberg (2012), who investigated the Exponential approximation of a Pareto distribution with Stein’s Method. It was shown that relative to DSTG’s Ingara radar clutter, in the case of VV-polarisation, the Exponential approximation was valid. This coincided with Pareto fits to the data which resulted in large shape parameters. Stein’s Method was used to construct explicit bounds to quantify this observation.

The current paper is concerned with understanding the validity of the Pareto–Exponential approximation, through an analysis of the Kullback–Leibler divergence. This will be shown to not only provide a simpler estimate of the distributional difference, but also will indicate how an optimal Exponential distribution can be selected for any given Pareto model. Numerical comparisons are used to demonstrate the validity of the approach.

## Pareto and Exponential distributions

*X*has a Pareto distribution with shape and scale parameters \(\alpha >0\) and \(\beta >0\) respectively if its probability density function is

*Y*with shape parameter \(\lambda >0\) has an Exponential distribution if its density is given by

*X*yields

*X*limits to that of

*Y*as the Pareto shape parameter increases without bound.

The problem with the Stein approach is that the bounds do not suggest a suitable way in which, for a given Pareto model, an appropriate approximating Exponential distribution can be specified. This can be rectified with an application of the Kullback–Leibler divergence as an alternative to analysing distributional approximations.

## Information theory

Information theory is concerned with the study of entropy as a measure of uncertainty, and was introduced into the engineering community by Shannon (1948), and has had a profound effect on the understanding and optimisation of data networks (Arndt 2004). In particular, the Kullback–Leibler divergence, introduced in Kullback and Leibler (1951), has found application in signal processing analysis and statistical model fitting (Hulle 2005; Seghouane 2006; Youssef et al. 2016; Wenling and Yingmin 2016).

*X*and

*Y*with densities \(f_X\) and \(f_Y\), the information lost when

*Y*is used to approximate

*X*is defined to be

*X*and

*Y*, and the entropy of

*X*(Arndt 2004). Since (7) measures the information lost in the approximation of

*X*by

*Y*, it can be used to assess the convergence of these distributions.

*X*and

*Y*coincide then \(D_{KL}(X ||Y) =0\). The converse of this can also be demonstrated to be true. However, it is clear from (7) that the Kullback–Leibler divergence is not symmetric, nor satisfies a triangle inequality. Consequently it is not a metric but is a pseudo-metric. Its value in assessing convergence in distribution follows from the Pinsker–Csiszár Inequality (Pinsker 1964; Csiszár 1967; Kullback 1967). Suppose for the two random variables

*X*and

*Y*their distribution functions are \(F_X(t)\) and \(F_Y(t)\) respectively, with support the nonnegative real line. Then this inequality states that

*X*and

*Y*are close in distribution.

Also based upon (8), if a sequence of random variables \(X_{n}\) is such that \(\lim\nolimits_{n\rightarrow \infty } D_{KL}(X_{n} || Y) = 0\), for some random variable *Y*, then the limiting distribution of \(X_n\) and *Y* coincide, which can be justified with an application of Lebesgue’s Dominated Convergence Theorem.

## Kullback–Leibler divergence

*X*, and the fact that the density of

*X*integrates to unity has been applied.

*X*can be shown to be

Using a similar analysis it can be shown that the Stein lower bound, namely \(-\frac{1}{\alpha -1}\), tends to be closer to zero than that obtained by the Kullback–Leibler divergence, as illustrated in the right subplot of Fig. 5.

## Conclusions

The Kullback–Leibler divergence was used to assess the discrepancy between the Pareto and Exponential distributions, in order to better understand the validity of the Exponential approximation of the Pareto model. It was shown that for any given Pareto model an optimal Exponential approximation exists. This approximation was shown to improve as the Pareto shape parameter increased, for any fixed Pareto scale parameter. This means that in cases where in X-band maritime surveillance radar the Pareto shape parameter exceeds 30, it is acceptable to apply detection schemes based upon an Exponential clutter model assumption.

## Declarations

### Acknowledgements

Thanks are due to the two reviewers for their comments, and in particular, to one who suggested the derivation (5) be included.

### Competing interests

The author declare that he has no competing interests.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- Arndt C (2004) Information measures: information and its description in science and engineering. Springer series on signals and communication technology. Springer, Berlin, Heidelberg. ISBN: 3-540-40855-XGoogle Scholar
- Balleri A, Nehorai A, Wang J (2007) Maximum likelihood estimation for compound-Gaussian clutter with inverse Gamma texture. IEEE Trans Aerosp Electron Syst 43:775–779View ArticleGoogle Scholar
- Barbour AD, Chen LHY (2005) An introduction to Stein’s method. Lecture Notes Series, Institute for Mathematical Sciences, National University of Singapore, vol 4. Singapore University Press, SingaporeGoogle Scholar
- Beaumont GP (1980) Intermediate mathematical statistics. Chapman and Hall, LondonView ArticleGoogle Scholar
- Csiszár I (1967) Information-type measures of difference of probability distributions and indirect observations. Stud Sci Math Hung 2:299–318Google Scholar
- Farshchian M, Posner FL (2010) The Pareto distribution for low grazing angle and high resolution X-band sea clutter. In: IEEE Radar conference, pp 789–793Google Scholar
- Van Hulle MM (2005) Mixture density modeling, Kullback–Leibler divergence, and differential log-likelihood. Signal Process 85:951–963View ArticleGoogle Scholar
- Kullback S (1967) Lower bound for discrimination information in terms of variation. IEEE Trans Inf Theory 13:126–127View ArticleGoogle Scholar
- Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86View ArticleGoogle Scholar
- Pinsker MS (1964) Information and information stability of random variables and processes (trans: Feinstein A). USSR series Problemy Peredaci Informacii Moscow 1960, vol 7. Holden-Day, San FranciscoGoogle Scholar
- Rosenberg L, Bocquet S (2015) Application of the Pareto plus noise distribution to medium grazing angle sea-clutter. IEEE Sel Top Appl Earth Obs Remote Sens 8:255–261View ArticleGoogle Scholar
- Rosenberg L, Bocquet S (2013) The Pareto distribution for high grazing angle sea-clutter. In: Proceedings of international geoscience and remote sensing symposium, pp 4201–4212Google Scholar
- Sangston KJ, Gini F, Greco MS (2012) Coherent radar target detection in heavy-tailed compound Gaussian clutter. IEEE Trans Aerosp Electron Syst 48:64–77View ArticleGoogle Scholar
- Seghouane A-K (2006) Multivariate regression model selection from small samples using Kullback’s symmetric divergence. Signal Process 86:2074–2084View ArticleGoogle Scholar
- Shang X, Song H (2011) Radar detection based on compound-Gaussian model with inverse gamma texture. IET Radar Sonar Navig 5:315–321View ArticleGoogle Scholar
- Shannon CE (1948) A mathematical theory of communication. Bell Syst Tech J 27(379–423):623–656View ArticleGoogle Scholar
- Stacy NJS, Burgess MP (1994) Ingara: the Australian airborne imaging radar system. In: Proceedings of the international geoscience and remote sensing symposium, pp 2240–2242Google Scholar
- Stacy N, Crisp D, Goh A, Badger D, Preiss M (2005) Polarimetric analysis of fine resolution X-band sea clutter data. In: Proceedings of the international geoscience and remote sensing symposium, pp 2787–2790Google Scholar
- Weinberg GV (2011a) Assessing Pareto fit to high resolution high grazing angle sea clutter. IET Electron Lett 47:516–517View ArticleGoogle Scholar
- Weinberg GV (2011b) Coherent multilook radar detection for targets in Pareto distributed clutter. IET Electron Lett 47:822–824View ArticleGoogle Scholar
- Weinberg GV (2012) Validity of whitening-matched filter approximation to the Pareto coherent detector. IET Signal Process 6:546–550View ArticleGoogle Scholar
- Weinberg GV (2013a) Constant false alarm rate detectors for Pareto cutter models. IET Radar Sonar Navig 7:153–163View ArticleGoogle Scholar
- Weinberg GV (2013b) Assessing detector performance, with application to Pareto coherent multilook radar detection. IET Radar Sonar Navig 7:401–412View ArticleGoogle Scholar
- Weinberg GV (2013c) Coherent CFAR detection in compound Gaussian clutter with inverse gamma texture. EURASIP Adv Signal Process 1:105View ArticleGoogle Scholar
- Weinberg GV (2015) Examination of classical detection schemes for targets in Pareto distributed clutter: do classical CFAR detectors exist, as in the Gaussian case? Multidimens Syst Signal Process 26:599–617View ArticleGoogle Scholar
- Wenling L, Yingmin J (2016) Kullback-Leibler divergence for interacting multiple model estimation with random matrices. IET Signal Process 10(1):12–18. doi:10.1049/iet-spr.2015.0149 View ArticleGoogle Scholar
- Youssef A, Delpha C, Diallo D (2016) An optimal fault detection threshold for early detection using Kullback–Leibler divergence for unknown distribution data. Signal Process 120:266–279View ArticleGoogle Scholar