High capacity data hiding scheme based on (7, 4) Hamming code

Aiming to embed large amount of data while minimize the sum of costs of all changed pixels, a novel high capacity data hiding scheme based on (7, 4) Hamming code is realized by a family of algorithms. Firstly, n (n = 1, 2, 3) cover pixels are assigned to one set according to the payload. Then, 128 binary strings of length seven are divided into eight sets according to the syndrome of every binary string. Binary strings that share the same syndrome are classified into one set. Finally, a binary string in a certain set determined by the data to be embedded is chosen to modify some of the least significant bits of the n cover pixels. The experimental results demonstrate that the image quality of the proposed method with high embedding payload is superior to those of the related schemes.

embedding capacity got increased to be (k + 1)/2 k bpp. Later, Chang et al. proposed a new scheme (Chang and Chou 2008) based on the idea of classification in 2008. Binary strings were assigned into eight sets. A binary string of length 2 k − 1 in a specific set was selected out to embed k bits. It presented a new idea in applying Hamming code to data hiding. But the embedding capacity didn't get improved compared with the previous two scheme. It is equal to the embedding capacity of the matrix encoding scheme (Crandall 1998).
The marked-image quality of the aforementioned schemes is ideal when the embedding payload is low (no more than k/(2 k − 1) or (k + 1)/2 k bpp), but it degrades hardly with the increase of the embedding payload. Against this problem, a new data hiding scheme based on (7, 4) Hamming code is proposed in this paper. The marked-image quality of the proposed scheme is superior to those of the related works in Crandall (1998), Zhang et al. (2007) and Chang and Chou (2008) under a high embedding payload.

The Hamming code
An error correcting code could not only detect that errors have occurred but also locate the error positions. Hamming code is a linear error correcting code that can detect and correct single bit errors. The (n, n − k) Hamming code uses n cover bits to transmit n − k message bits, and the other k bits used for error correcting purpose are called parity check bits, where n = 2 k − 1 on the binary filed. S = {C 1 , C 2 , …,C M } is a set of code words. The number of elements of S, denoted as |S|, is called the cardinality of the code. For any two code words x = (x 1 , x 2 , …, x n ) ∊ S and y = (y 1 , y 2 , …, y n ) ∊ S, the Hamming distance is defined by d H (x, y) = |{i|x i ≠ y i }|. The minimum distance of the code S is defined as d min = min {d H (x, y)|x, y ∊ S}. And the covering radius of the code S is r if any binary string u = (u 1 , u 2 , …, u n ) differs from at least one code word x = (x 1 , x 2 , …, x n ) ∊ S in at most r positions. The minimum distance d min measures the error-correcting capability, and the maximum distortion that occurs when a binary string is replaced by a proper code word is measured by the covering radius r. Therefore, a large value of the minimum distance d min is preferable to the purpose of error correction whereas a small value of the covering radius r is preferable to the purpose of steganography. The (7, 4) Hamming code is a binary code of length n = 7, with cardinality |S| = 16, minimum distance d min = 3, and covering radius r = 1. The (7, 4) Hamming code is now taken as an example to demonstrate how Hamming code correct an error bit. Suppose that the message bits are m = (1010). First, the code generator matrix G is used to form n cover bits C as follows.
Next, the code word C is transmitted to a receiver via a noise communication channel. Supposed that the received code word is C′ = (1011010). Then the parity check matrix H is used to compute the syndrome vector z = (z 1 , z 2 , z 3 ) for checking an error as follows. The vector z T = (011) T is identical to the fourth column of the parity check matrix H. Thus, an error is detected at the fourth position of C′, and C′ is corrected by C′ = C′ ⊕ e 4 = (1010010), where ⊕ is the exclusive-or operation, and e i , the error pattern, is a unit vector of length n with a "1" located at the i-th position. If the syndrome vector is z = (000), the receiver can conclude that no error has occurred.

"Matrix Encoding"
In the matrix encoding scheme, a string of k bits s = (s 1 , s 2 , . . . , s k ) is embedded into a group of n cover pixels by adding or subtracting one to or from at most one cover pixel, where n = 2 k − 1. Firstly, the syndrome vector z = (z 1 , z 2 , . . . , z k ) is calculated by z = (c × H T ) ⊕ s, with c = (LSB(p 1 ), LSB(p 2 ), . . . , LSB(p n )) and LSB (p i ) means the least significant bit of i-th pixel p i . H is the parity check matrix of the (n, n − k) Hamming code. T is the transpose operation, and ⊕ is the exclusive-or operation. Next, if the computed syndrome vector z is (0, 0,…, 0), then the group of n marked pixels R is set to be equal to c; otherwise, find the i-th column of H that is equal to the transposed syndrome vector z T . The group of n marked pixels R is calculated by R = e i ⊕ c, where e i is a unit vector of length n with "1" located at the i-th position. At the receiving side, a receiver can extract the original binary string s from the received group R by s = R × H T .
where H is the parity check matrix of the (n, n − k) Hamming code, T is the transpose operation. This means that the first kψ secret bits of s are embedded into the first n bits of p by using matrix encoding, and the last secret bit of s is embedded by using the function of nψ cover pixels p. The embedding rules proposed in (Zhang et al. 2007) are as follows. If Eq. (1) does not hold, then p n+1 is kept unchanged, and one cover pixel p i (1 ≤ i ≤ n) needs to be increased or decreased by one to make Eqs. (1) and (2) hold simultaneously. If (1) holds and (2) does not, the first n pixels are kept unchanged and last cover pixel p n+1 is randomly increased or decreased by one.

"Nearest Code"
In the nearest covering code scheme (Chang and Chou 2008), all possible combinations of seven bits are classified into eight sets G 0 , G 1 , …G 7 . There are 16 ele- , H is the parity check matrix of the (7, 4) Hamming code, T is the transpose operation. A covering code G v s with nearest Hamming distance to P = (LSB(p 1 ), LSB(p 2 ), . . . , LSB(p 7 )) is selected in G s according to secret bits s = (s 1 , s 2 , s 3 ), where the subscript of G s is equal to the corresponding decimal number of s = (s 1 , s 2 , s 3 ). Then, the cover pixels are modified by G v s . At the receiving side, a legal receiver can extract the original secret bits s from the received group of 7 pixels R by s = R × H T .

The proposed scheme
In the proposed scheme, a secret binary string of length three is in a mapping relationship with the error pattern of the (7, 4) Hamming code and then can be embedded into a group of cover pixels. The number of the cover pixels in different groups varies under different embedding payload.

The preparations
I is the cover image sized H × W, and marked_I is the marked-image with data check matrix of the (7, 4) Hamming code. A string of binary bits (b 1 b 2 …b 7 ) is the cover of a string of three bits ( is the marked-string of (b 1 b 2 …b 7 ). p i is the i-th pixel in cover image, and p i ′ is the i-th pixel in marked-image. p i j represents the j-th least significant bit of pixel p i . ER, i.e. embedding rate, is calculated as follows.
N n is the number of groups that n (n = 1, 2, 3) cover pixels are used to embed a three bits string. And N n satisfies Formula (4). The first equation of Formula (4) indicates that the number of bits to be embedded is equal to the amount of bits the cover image could bear under a particular embedding rate. And the second equation in Formula (4) requires that the cover pixels we need are less or equal to the pixels the cover image could provide.
To modify the cover pixels as less as possible, Formula (4) is processed to obtain Formula (5) based on the following considerations. The top priority scheme, grouping three cover pixels together to embed a three bits string, satisfies Formula (4) when 0 ≤ ER ≤ 1. When 1 < ER ≤ 1.5, grouping two cover pixels to embed a binary string satisfies Formula (4), but there will be some pixels in the cover image unused. Instead, we embed some secret binary strings into groups of three cover pixels and the others into groups of two cover pixels. Obviously, this scheme causes less modification to cover image than the scheme that only using two cover pixels to embed binary strings. Likewise, we embed some binary strings into groups of two cover pixels and the others into groups of one cover pixel when 1.5 < ER < 3. Therefore, adaptive N n is calculated by Formula (5), which contributes to minimize the sum of costs of all changed pixels.

The data embedding phase
All binary strings of length seven are classified into eight sets G 0 , G 1 , …G 7 . There are 16 elements in every set Specific embedding algorithms are as follows.

Example: data extracting
Suppose the receiver receives the marked-image sized H × W = 3 × 3 shown in Fig. 2 and knows that the embedding rate (ER) is 2 bpp.

Experiment results
To evaluate the performance of the proposed scheme, we simulate the "Matrix Encoding" (Crandall 1998), the "Hamming+1" (Zhang et al. 2007), the "Nearest Code" (Chang and Chou 2008) and the proposed scheme by software Matlab R2014a. Standard grayscale test images sized 512 × 512 are used in the simulations, as shown in Fig. 3.

Preprocessing
In order to make comparison objectively and fairly, the embedding capacity of "Matrix Encoding" (Crandall 1998), "Hamming+1" (Zhang et al. 2007) and "Nearest Code" (Chang and Chou 2008) are also enhanced by extending the least significant bit to general LSBs. The extension method of "Matrix Encoding" is as follows. Every 3-bit string is embedded into G i (1 ≤ i ≤ 7) which is composed of the i-th least significant bit of 7 pixels by the matrix encoding method. Thus, the embedding capacity of the extended "Matrix Encoding" method become 3 bpp. The "Hamming+1" scheme is extended as follows. Every 4-bit string is embedded into G i (1 ≤ i ≤ 4), composed of the (2i − 1)-th and 2i-th least significant bits of 8 pixels, using the "Hamming+1" method. Thus, the embedding capacity of the extended "Ham-ming+1" scheme become 2 bpp.
Also, the extension method of the "Nearest Code" is as follows. Every 3-bit string is embedded into G i (1 ≤ i ≤ 7) which is composed of the i-th least significant bit of 7 pixels by the "Nearest Code" method, making the embedding capacity of the extended "Nearest Code" method be 3 bpp.

Lena
Baboon Man

Tiffany Peppers Boat
Jet Sailboat Splash

Fig. 3 The nine test images
To be fair to compare with the related works, the same method used in obtaining Formula (5) is applied here to process the extended "Matrix Encoding", "Hamming+1" and "Nearest Code" to be adaptive to the payload as follows.
The extended "Matrix Encoding": The extended "Hamming+1": The extended "Nearest Code": where, N i represents the groups of data bits embedded in G i .

Image quality
PSNR (Peak Signal to Noise Ratio) is widely used to measure the image quality of marked-images by calculate the difference between the marked-image and the cover image, which is defined as follows.
The above equations demonstrate that the smaller the difference between the markedimage and cover image is, the greater the PSNR value is. In general, if a marked-image with PSNR value greater than 30 dB, the distortion of the marked-image is hard to be detected by human eyes.
Tables 1, 2, 3 and 4 show the PSNR values of marked images generated by different methods with several payloads, i.e. ER = 1 bpp, ER = 1.5 bpp, ER = 2 bpp and ER = 3 bpp. The data in tables are the mean value of ten independent experiments. And Proposed scheme 51.14 51.14 51.14 51.15 51.14 51.14 51.15 51.14 51.14 data bits embedded into images are generated randomly. From the tables, it's obvious that the PSNR values of the proposed scheme are higher than those of the related works. It indicates that the marked-image quality of the proposed scheme is superior to those of the related works under the same payload. The PSNR-ER comparison results of Lena and Baboon are shown in Figs. 4 and 5. From the figures, the PSNR values of the proposed scheme are slightly lower than those of the extended "Matrix Encoding", "Hamming+1" and "Nearest Code" schemes when the embedding rate is relatively small, but while the embedding rate gets greater, the PSNR values of the proposed scheme are significantly higher than those of the other methods. By the way, the curves of the "Extend Matrix Encoding" and the "Extend Nearest Code" are completely overlapped, because both of the two methods embed three bits by modifying one bit. Thus, only the results of the extended "Matrix Encoding" scheme are shown in the next experiment results.
Take Lena and Baboon for examples, the marked-images of the extended "Matrix Encoding", the extended "Hamming+1"and the proposed scheme under different payloads are shown in Figs. 6 and 7. We can see that there is no distinct difference between the marked-images when the embedding rate is 3/7 bpp. When the embedding rate is up to 2 bpp, we can see spots easily on the marked-image of the extended "Hamming+1" scheme, and can hardly see any spot on the marked-image of the proposed scheme. The same observations can be found between the proposed scheme and the extended "Matrix Encoding" scheme while ER = 3 bpp.

Security analysis
Security is a significant problem for data hiding. Many steganalysis methods uses statistics tools to analysis the pixel value distribution on a suspicious image for cracking the secret message delivery. From this point of view, we analysis the pixel histograms between the test cover images and the marked images to measure the security of the data hiding methods. Take a smooth content image Lena and a complex content image Baboon for example, the pixel histogram results generated by the extended "Matrix Encoding" scheme, the extended "Hamming+1" scheme, and the proposed scheme with high payloads are shown in Fig. 8. From Fig. 8, the pixel histogram of the marked image generated by the proposed scheme is closer to the pixel histogram of the original image than those of the extended "Matrix Encoding" and extended "Hamming+1" scheme. It demonstrate that the security performance of the proposed scheme is better than the extended "Matrix Encoding" and the extended "Hamming+1" scheme.

Conclusions
Based on (7, 4) Hamming code, a novel high capacity data hiding scheme is proposed. Cover pixels are matched adaptively to embed data according to different embedding payloads. Compared to the related works, the image quality under high payload gets improved significantly while maintaining visual quality under low payload. Because of the use of pixel matching, a seed can be also used to match pixels to improve the security. Moreover, this method is not limited to grayscale images, but can be also applied to color images, compressed images, audios, videos and other digital media. Future works ing" and the extended "Hamming+1" scheme. Original image Matrix Encoding Proposed Fig. 8 The pixel histogram analysis comparison of Lena and Baboon. a "Matrix Encoding" of Lena with ER = 2 bpp. b "Hamming+1" of Lena with ER = 2 bpp. c "Matrix Encoding" of Lena with ER = 3 bpp. d "Matrix Encoding" of Baboon with ER = 2. e "Hamming+1" of Baboon with ER = 2 bpp. f "Matrix Encoding" of Baboon with ER = 3