- Research
- Open access
- Published:

# Fast entropy-based CABAC rate estimation for mode decision in HEVC

*SpringerPlus*
**volume 5**, Article number: 756 (2016)

## Abstract

High efficiency video coding (HEVC) seeks the best code tree configuration, the best prediction unit division and the prediction mode, by evaluating the rate-distortion functional in a recursive way and using a “try all and select the best” strategy. Further, HEVC only supports context adaptive binary arithmetic coding (CABAC), which has the disadvantage of being highly sequential and having strong data dependencies, as the entropy coder. So, the development of a fast rate estimation algorithm for CABAC-based coding has a great practical significance for mode decision in HEVC. There are three elementary steps in CABAC encoding process: binarization, context modeling, and binary arithmetic coding. Typical approaches to fast CABAC rate estimation simplify or eliminate the last two steps, but leave the binarization step unchanged. To maximize the reduction of computational complexity, we propose a fast entropy-based CABAC rate estimator in this paper. It eliminates not only the modeling and the coding steps, but also the binarization step. Experimental results demonstrate that the proposed estimator is able to reduce the computational complexity of the mode decision in HEVC by 9–23 % with negligible PSNR loss and BD-rate increment, and therefore exhibits applicability to practical HEVC encoder implementation.

## Background

High efficiency video coding (HEVC), which is the newly developed video coding standard, follows the so-called block-based hybrid coding architecture (Sullivan et al. 2012). HEVC aims at providing higher coding efficiency and making the codec better parallelization than the prior standards. The reference software HM (https://hevc.hhi.fraunhofer.de/svn/svn-HEVCSoftware) has achieved the expected performance, but at the cost of some high computational coding tools, including quadtree based coding unit (CU), large and asymmetric prediction unit (PU), residual quadtree based transform unit (TU) (Ohm et al. 2012; Bossen et al. 2012; Kim et al. 2012; Corrêa et al. 2012; Pan et al. 2014).

Mode decision, which controls how a coding tree unit (CTU) is coded with CUs with variable block sizes and prediction modes, is an essential process in HEVC. To achieve the best performance, HEVC seeks the best coding tree configuration, the best PU division and the prediction mode, etc., by evaluating the rate-distortion (R-D) functional where a distortion term is weighted against a rate term using a “try all and select the best” strategy (Pan et al. 2014).

The rate term in the R-D functional represents an estimate for the number of coded bits produced by the entropy coder. Unlike H.264/AVC, the context adaptive variable length coding (CAVLC) is not supported in HEVC. It only defines context adaptive binary arithmetic coding (CABAC) , which involves three elementary steps: binarization, context modeling, and binary arithmetic coding (Marpe et al. 2003), as the entropy coder. The binarization step maps the non-binary valued syntax elements (SEs), which will be represented in the bitstream and describe how the video sequence can be reconstructed at the decoder, to binary symbols. This step will prolong the encoding pipelines, for it typically maps one element to a bin string. The modeling stage assigns a model probability distribution which was updated using the statistics of the already coded neighboring symbols to binary symbols. In arithmetic coding stage, the actual coding engine is driven by the probability model. Based on recursive interval division and selection, the coding procedure generates a sequence of bits for representing the SEs.

CABAC has the advantage of high coding efficiency. However, these three steps are highly sequential, and induce strong data dependencies (Sze and Budagavi 2012). So, it is difficult to exploit parallelism and pipelining, and makes CABAC a well-known throughput bottleneck in the video codec implementation (Sze and Budagavi 2012; Sole et al. 2012). Typical approaches to fast CABAC rate estimation for mode decision simplify or eliminate the modeling and the coding steps, but leave the binarization step unchanged. The CABAC rate estimator for H.264/AVC introduced in Hahm and Kyung (2010) simplified the context modeling part, and replaced the calculation of the arithmetic coding by a table lookup scheme. It designed multiple lookup tables. Entries in the table were indexed by the probability state indexes which were integers between 0 and 62, and there had a one-to-one correspondence between the entries and a set of predefined representative probability values of the least probable symbol (LPS). By simplifying the modeling and coding steps, the rate estimator yielded about a 30 % reduction in the computational complexity of the R-D evaluation for H.264/AVC (Hahm and Kyung 2010). The fast CABAC rate estimator in Won et al. (2012) also was developed for H.264/AVC, and simplified the coding step by using a lookup table scheme. It designed only one table, which depended on two values, one was an index of the LPS probability, and the other was an indication of whether or not the most probable symbol (MPS) and the current binary to be coded were equal. In Hahm et al. (2009), a rate estimator which approximated the context modeling in CABAC was proposed. It was reported that the estimator reduced about 20 % of the computational complexity of the R-D optimization.

Our objective is to develop a fast CABAC rate estimator for mode decision in HEVC. Based on the assumption of CABAC in HEVC being able to achieve compression close to the entropy of a symbol sequence (Sze and Budagavi 2012), the proposed approach estimates the CABAC rate as a weighted sum of the information generated by the source. All three steps of CABAC, i.e., binarization, context modeling, and binary arithmetic coding, are eliminated. So, the proposed estimator has the advantages of being computationally more efficient and making the encoder better parallelizable.

The remainder of the paper is organized as follows. In “Entropy-based CABAC rate estimation” section, we present an overview of the rate estimation for R-D optimization in HEVC first. Then, the correlation between the entropy and CABAC rate of CUs is evaluated, and the entropy-based CABAC rate estimator is proposed. In “Experimental results” section, some experimental results are demonstrated. Finally, the paper is concluded in “Conclusion” section.

## Entropy-based CABAC rate estimation

### Rate estimation for rate-distortion optimization in HEVC

The HEVC design follows the classic block-based hybrid video coding approach. A picture is partitioned into a sequence of CTUs, which are analogous to macroblocks in previous standards. A CTU may contain only one CU, or may be split into four equal size CUs. In a recursive manner, a CU has a size of \(2N \times 2N\) (\(N = 8, 16, 32\)) can be further split into four smaller units of equal size. The block partition structure of a CTU is quadtree-like. A CU, which is the leaf node of the quadtree, specifies a region sharing the same prediction mode, i.e., intra or inter. The CU consists of a luma coding block (CB) and the corresponding chroma CBs and related syntax elements. Further, a CU can be split into one, two, or four PUs, and a PU defines a region sharing the same prediction information. For intra coded CUs, two possible PU splitting types are supported. For inter coded CUs, eight splitting types are defined. And for skipped CUs, only one PU splitting type (i.e., the same size as the CU) is allowed (Kim et al. 2012). After prediction and compensation, a nested quadtree partitions a CU residual into transform units, each of which defines a region sharing the same transformation.

How to determine the code tree configuration, determine the PU division, and select prediction modes for all CUs in a CTU is a critical problem for improving the coding efficiency. HEVC treats this problem as the so-called rate-distortion optimization (Sullivan and Wiegand 1998), that is,

where *R* and *D* represent the rate and distortion for a CU, respectively. \(R_c\) is the rate constraint. The constrained optimization task in (1) is solved using Lagrangian optimization where the distortion term is weighted against the rate term (Sullivan and Wiegand 1998),

where *M* denotes all possible coding parameter set (i.e., the setting of CU size, PU division, and prediction mode, etc.), *J* is the Lagrangian rate-distortion functional, and \(\lambda\) is the Lagrangian multiplier, which is usually determined by experiments or by the quantization parameter (QP) (Sullivan and Wiegand 1998). Taking CU depth decision for example, the R-D cost of \(CU_i\) (CU in the depth *i*) encoded in the un-split manner will compare with that in the split manner. The problem of coding tree configuration can be implemented by judging whether a CU with each size should be split or not in a recursive way (Xiong et al. 2014).

The rate term *R* in (2) may significantly affect the optimization process. Let *r* denote the ratio between \(\lambda R\) and *J* of an encoded CU, i.e., \(r = \frac{{\lambda R}}{J}\). Generally, larger *r* value implies that many more bits were coded for representing the SEs for the CU, and higher computational burden had imposed on the CABAC rate estimator. We computed the average *r* value for several test sequences and depicted the results in Table 1. It can be noticed that the average ratio is from 8.2 to 37.5 % (generally, the smaller the CU size, the bigger the ratio value). Before actual entropy coding, the optimization process in (2) should be performed for all candidate CUs, PUs, and TUs to obtain the optimal coding settings for a CTU. So, the computational burden for estimating the rate term is very high, and it is necessary to develop a fast CABAC rate estimator with adequate accuracy for the rate-distortion optimization process.

### The correlation between the entropy and CABAC rate

Suppose there is a source containing a discrete set of independent messages \(z_k\) with probabilities \(\Pr (k)\). The entropy, which measures the average information generated by all messages in the source, is defined as

where *L* denotes the number of possible different messages. According to Shannon’s noiseless coding theorem (Jain 1988), it is possible to code without distortion a source of entropy *H* bits using an average of \(H + \varepsilon\) bits per message, where \(\varepsilon\) is an arbitrarily small quantity.

We regard the SEs of a CU as a source. Using the entropy *H* of the SEs used to represent the information of the CU, we can estimate the lower bound on the number of bits required to encode the output of the CU, that is,

where \({N\left( {z_k } \right) }\) is the occurrence frequency of the elements with value \(z_k\).

Let \(x_i\) and \(y_i\) denote the estimated lower bound using (4) and the actual output of the CABAC encoder in HM 13.0 (Kim et al. 2013) for \(CU_i\), respectively. We represent them as a paired data \((x_i, y_i)\). The sample set \(\{(x_i, y_i)\}\) obtained from the first 25 frames of the test sequence *BasketballDrill* are depicted as points in Fig. 1. We noticed that there might exhibit difference between the estimated lower bound and the actual outputted bits of a CU. However, the experiments also suggested that there was a high correlation between the variables *x* and *y*. For confirming this hypothesis, the correlation between them is quantitatively measured as the correlation coefficient below:

where \(\bar{x}\) and \(\bar{y}\) are the means of *x* and *y*, respectively. The results for several test sequences are depicted in Table 2.

### The proposed CABAC rate estimator and implementation

In CABAC, binarization maps a given SE to an unique binary string, and different element results different bins with different number of “0” and “1” bits. For the regular coding mode, a particular context model will be chosen, and one of two binary values (0 or 1) will be identified as the MPS, while the other will be identified as the LPS (Sole et al. 2012). Further, a LPS reduces more the interval range, and has a higher probability to generate an output bit than a MPS does. Considering the effect of these two steps, \(R_{\min }\) in (4) is not directly adopted as the estimation of the number of coded bits for a CU. Instead, we introduce a vector \(\mathbf {w}\) containing the weights corresponding to different SE values to take into account the effect of the binarization and context modeling. The estimator is formulated as the linear regression model below

where \(\mathbf {w}\) and \(\mathbf {u}\) are *L*-dimensional column vectors, and

and

The formulation uses \(\mathbf {u}\) as the input regressor vector. It now remains to determine a suitable parameter vector \(\mathbf {w}\) that can predict the CABAC rate with acceptable accuracy. We regard the estimator in (6) as a time varying system whose parameters change over time. Since there exists high correlation between the lower bound and the number of actual coded bits of the CABAC encoder, it is reasonable to assume that the variations of the parameters are slow. We address the parameter vector as the state of a linear discrete-time dynamic system.

We embed the parameter updating within the actual CABAC coding process, which is activated to code all the SEs into bitstream after the optimal structure of the CTU and the best mode of the current CU being determined. The relation between the observed measurement and state vector is:

where \(R_k\) is the number of actual coded bits for \(CU_k\). \(e_1\) and \(e_2\) represent the process and measurement noise, respectively. They are assumed to be white and with normal probability distributions, that is:

The process noise covariance matrix \(\mathbf {Q}\) is assumed to be diagonal with very small diagonal entries. This means that state parameters are independent from each other and the variance of a state is small. The variance \(\sigma _e^2\) in (12) changes with each measurement, and is defined by

The Kalman filter (Catlin 1988), which estimates the state by using a form of feedback control, is employed in our system. The time update (prediction) equations are:

where \({{\hat{\mathbf{w}}}}_{k + 1}^ -\) denotes the a priori state estimate, and \({\mathbf{{P}}_{k + 1}^ - }\) is the a priori state error covariance matrix for time update. During the measurement update, the Kalman gain is computed as:

Then, the a posteriori state estimate is calculated as:

And the a posteriori error covariance is:

where \(\mathbf {I}\) denote the identity matrix.

Finally, we would like to make some comments about the proposed CABAC rate estimator here. First, for a CU, the update of the weight vector is only performed once, while the estimator in (6) was evaluated dozens of times for determining the prediction mode, PU partition. Though the update in (14)–(17) are somewhat computationally complicated, it will not degrade the computational efficiency of the estimator more. Second, HEVC has many different SEs. For a SE which was represented as an on/off flag (e.g., merge_flag, cu_split_flag), we simply added one bit to the estimated rate for each flag, while Eq. (6) was applied for other SEs in our experiments.

## Experimental results

To evaluate the performance of the fast entropy-based CABAC rate estimator, the proposed algorithm was implemented on the HEVC reference software HM 13.0 (Kim et al. 2013) with the HEVC common test conditions (Bossen 2012) which were summarized in Table 3. We encoded three class B sequences with spatial resolution of \(1920 \times 1080\) and three class E sequences with resolution of \(1280 \times 720\). The test sequences and their sequence number are tabulated in Table 4. Simulations were run on a personal computer with an Intel Core i5-4430 CPU and 4 GB RAM. The operating system was Microsoft Windows 7 64-bit Enterprise edition.

### Time saving

We recorded the time consumption for mode decision for all CUs (that is, the time consumed by *xCompressCU*(), which is the CU analysis module in HM software), and accumulated them. The computational complexity reduction was calculated as follows (Xiong et al. 2014):

where \(T_{ref}\) denotes the accumulated time consumption of the original HM 13.0 encoder, and \(T_{test}\) denotes the time consumed when the fast CABAC rate estimators were adopted for R-D cost evaluation.

For comparison, the algorithm in Won et al. (2012), which was developed for fast rate estimation for H.264/AVC mode decision, was also implemented on HM. The mean value of \(\Delta T\) for the tested sequences in Table 4 are summarized in Fig. 2. It shows that the proposed algorithm saves the rate estimation computation from 9.2 to 22.3 %, and 14.5 % on average. The results also indicate that the proposed estimator is computationally more efficient than the previous algorithm. Especially, the performance improvement was higher when the QP was small. The reason for this is that the QP value has an impact on the overall encoding time, and smaller QP implies higher residual energy, and higher computational burden will imposed on the CABAC rate estimator. Under these circumstances, our scheme has the advantage of eliminating all three steps of CABAC coding, while the method in Won et al. (2012) leaves the binarization step unchanged.

### Compression efficiency

We compared the coding performance of the proposed algorithm with the reference software in terms of Bjøntegaard delta rate (BD-Rate), Bjøntegaard delta PSNR (BD-PSNR) (Bjøntegaard 2001). The experimental results were summarized in Table 5. We notice that the proposed estimator slightly increased the BD-rate. For luminance component, the increment is from 1.81 to 2.43 %, and about 2.21 % on average. The increment for chrominance components (Cb and Cr) is from 0.19 to 2.29 %, and 1.19 % on average. However, the complexity reduction and the performance of better parallelizable it provided were worth the expense of the rate increment. From the results, it also can be observed that the BD-PSNR loss is minor. Also, we evaluated the performance of the CABAC rate estimator by the estimation error using the criterion below

where \(R_t\) is the estimated rate of our estimator, and \(R_r\) is that of the HM software. We calculated the mean value of \(\Delta R\) for all CUs, and depict the result in Table 5. The coding performance of the proposed algorithm was also compared with the estimator in Won et al. (2012) in terms of \(\Delta T\), BD-Rate and BD-PSNR, and the results were depicted in Table 6.

## Conclusion

A fast entropy-based CABAC rate estimation algorithm, which is applicable to rate-distortion optimization in HEVC, is proposed in this paper. The syntax elements of a CU are regarded as a source containing a discrete set of independent messages. The implicit relation between the entropy of the source and the number of coded bits (i.e., the output of the CABAC encoder) for the CU is investigated. Exploiting the correlation between these two values, the proposed approach estimates the CABAC rate as a weighted sum of the information generated by the source. The weight vector, which is employed to compensate the effect of the binarization and context modeling steps, is addressed as the state of a linear discrete-time dynamic system. The weights are adaptively updated within the actual CABAC encoding process, which is activated to encode all the SEs into bitstream after the best mode of the current CU being determined, using the Kalman filtering.

## References

Bjøntegaard G (2001) Calculation of average psnr differences between rd-curves. In: VCEG documenta VCEG-M33, ITU-T SG16/Q6, Austin. ITU-T, pp 1–5

Bossen F, Bross B, Sühring K, Flynn D (2012) HEVC complexity and implementation analysis. IEEE Trans Circuits Syst Video Technol 22(12):1685–1696

Bossen F (2012) Common test conditions and software reference configurations. In: JCT-VC document JCTVC-J1100, 10th meeting, Stockholm, 11–20 Jul. 2012. JCT-VC, pp 1–3

Catlin DE (1988) Estimation, control, and the discrete Kalman filter. Springer, New York

Corrêa G, Assuncão P, Agostini L, Cruz LAS (2012) Performance and computational complexity assessment of high-efficiency video encoders. IEEE Trans Circuits Syst Video Technol 22(12):1899–1909

Hahm J, Kyung C-M (2010) Efficient CABAC rate estimation for H.264/AVC mode decision. IEEE Trans Circuits Syst Video Technol 20(2):310–316

Hahm J, Kim J, Kyung C-M (2009) A fast cabac rate estimator for H.264/AVC mode decision. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing, Taipei, 19–24 April 2009. IEEE Computer Society, pp 929–932

HM Software. https://hevc.hhi.fraunhofer.de/svn/svn-HEVCSoftware

Jain AK (1988) Fundamentals of digital image processing. Prentice Hall, Englewood Cliffs

Kim I-K, Min J, Lee T, Han W-J, Park JH (2012) Block partitioning structure in the HEVC standard. IEEE Trans Circuits Syst Video Technol 22(12):1697–1706

Kim I-K, McCann K, Sugimoto K, Bross B, Han W-J, Sullivan G (2013) High efficiency video coding (hevc) test model 13 (hm 13) encoder description. In: JCT-VC document JCTVC-O1002, 15th meeting, Geneva, Oct. 23–Nov. 1 2013. JCT-VC, pp 1–39

Marpe D, Schwarz H, Wiegand T (2003) Context-based adaptive binary arithmetic coding in the H.264/AVC video compression standard. IEEE Trans Circuits Syst Video Technol 13(7):620–636

Ohm J-R, Sullivan GJ, Schwarz H, Tan TK, Wiegand T (2012) Comparison of the coding efficiency of video coding standards—including high efficiency video coding (HEVC). IEEE Trans Circuits Syst Video Technol 22(12):1669–1684

Pan Z, Kwong S, Sun M-T, Lei J (2014) Early merge mode decision based on motion estimation and hierarchical depth correlation for HEVC. IEEE Trans Broadcast 60(2):405–412

Sole J, Joshi R, Nguyen N, Ji T, Karczewicz M, Clare G, Henry F, Dueñas A (2012) Transform coefficient coding in HEVC. IEEE Trans Circuits Syst Video Technol 22(12):1765–1777

Sullivan GJ, Ohm J-R, Han W-J, Wiegand T (2012) Overview of the high efficiency video coding (HEVC) standard. IEEE Trans Circuits Syst Video Technol 22(12):1649–1668

Sullivan GJ, Wiegand T (1998) Rate-distortion optimization for video compression. IEEE Signal Process Mag 15(6):74–90

Sze V, Budagavi M (2012) High throughput CABAC entropy coding in HEVC. IEEE Trans Circuits Syst Video Technol 22(12):1778–1791

Won K, Yang J, Jeon B (2012) Fast CABAC rate estimation for H.264/AVC mode decision. Eletron Lett 48(19):1201–1202

Xiong J, Li H, Wu Q, Meng F (2014) A fast hevc inter cu selection method based on pyramid motion divergence. IEEE Trans Multimed 16(2):559–564

## Authors’ contributions

WG carried out the CABAC Rate Estimation studies, edited the paper and drafted the manuscript. All authors read and approved the final manuscript.

### Acknowledgements

This work was supported in part by the Natural Science Foundation of Zhejiang Province (No. LY14F020001), Natural Science Foundation of China (No. 61379075), National Key Technology Support Program (No. 2014BAK14B00), and Zhejiang Province Science and Technology Plan Project (Grant No. 2014C33070). The authors would like to thank the anonymous reviewers for their helpful comments.

### Competing interests

The authors declare that they have no competing interests.

## Author information

### Authors and Affiliations

### Corresponding author

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

### Cite this article

Chen, WG., Wang, X. Fast entropy-based CABAC rate estimation for mode decision in HEVC.
*SpringerPlus* **5**, 756 (2016). https://doi.org/10.1186/s40064-016-2377-0

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s40064-016-2377-0