In this section, we describe a conceptual image trading framework for an untrusted cloud environment that satisfies all the requirements mentioned in “Introduction”. The proposed framework enables secure online trading, and allows the images to be securely stored on the cloud servers after being visually protected and to be retrieved in their protected state. The following description is based on the scheme illustrated in Figure 5.
Original images owned by an image publisher are first encoded and visually protected by means of scrambling in the DCT domain (1a). At the same time, thumbnails are generated by resizing the original images to any required sizes for viewing in a display device (1b). The protected images are then uploaded and stored on a cloud repository server. In this manner, the true visual content of the original images cannot be accessed by the server provider. Thumbnails can be stored on the same server, and are publicly accessible through the website. A potential image buyer will browse the thumbnail library and choose images of interest, which also serve as queries (2). When a query image is submitted, the thumbnail is matched with the protected images by comparing the moment invariants of the thumbnail and of the DCimage generated from the protected images (3). After this matching process, the server will return the matched image, which can then be downloaded or sent to the potential buyer (4). However, the matched image remains visually protected unless a key is granted by the image publisher after payment or other authorization (5). Using an authentic key, the buyer will be able to decode and descramble the data, resulting in the true traded image (6).
Scrambling process
The main purpose of image scrambling is to provide visual protection so that the true content is perceptually meaningless or degraded. Therefore, the images are secure against illintentioned parties who may have access to the server, such as a hosting provider or hackers. Depending on the degree of scrambling, visual protection can be achieved by applying existing scrambling techniques that work in the DCT domain, such as those proposed in Kiya and Ito (2008) and Khan et al. (2010a, b).
A simplified diagram of a JPEGbased image scrambling for visual protection is shown in Figure 6, in which a blockbased permutation is applied to the quantized DCT coefficients. Descrambling is simply a reverse process, given the same key as in the scrambling proses is available.
DC image generation and thumbnails
It is known that the DC coefficient of each 8×8 array of DCT coefficients is actually an average value of the 64 pixels within the corresponding block. Hence, it contains very rich visual information. An image constructed from DC components is a reducedsized version that is visually similar to the original. Therefore, the DC image itself is a rich feature descriptor that can be exploited for matching purposes.
The process of generating a DCimage from DCT coefficients is illustrated in Figure 7. Initially, an image is partitioned into 8×8 nonoverlapped blocks (referred to as a tile or a block), and a forward DCT function is employed to each block. The DC coefficient of each block represents the local average intensity and holds most of the block energy. DC coefficients from all of the blocks are then arranged according to the order of the original blocks, resulting in a reducedsize image (\(\frac{1}{64}\) of the original image) referred to as a DCimage.
In relation to the JPEG standard, it is worth noting that the DC coefficients can be directly extracted from the JPEG bitstream without the need for full JPEG decoding (Arnia et al. 2009), and the DCimage can be generated accordingly.
However, thumbnails for preview or browsing purposes can be produced by downscaling the original images to the sizes best suited to the dimensions of the display devices.
Image matching
In this section, an image matching technique and its corresponding matching distance are described. We exploit the seven Hu moments (MingKuei 1962) for matching purposes. The moments of an image, with pixel intensities \(I(x,y)\) and of size \(M\times N\), are defined by:
$$\begin{aligned} m_{pq} = \sum _{y=0}^{M1} \sum _{x=0}^{N1}x^py^qI(x,y) \end{aligned}$$
(4)
Rather than Eq. (4), the central moments:
$$\begin{aligned} \mu _{pq} = \sum _{y=0}^{M1} \sum _{x=0}^{N1}(x\bar{x})^p(y\bar{y})^qI(x,y) \end{aligned}$$
(5)
with
$$\begin{aligned} \bar{x}=\frac{m_{10}}{m_{00}}, \quad \bar{y}=\frac{m_{01}}{m_{00}} \end{aligned}$$
are often used, which are invariant to translation. Furthermore, normalized central moments are defined by:
$$\begin{aligned} \eta _{pq} = \frac{\mu _{pq}}{\mu ^\gamma _{00}} \end{aligned}$$
(6)
with
$$\begin{aligned} \gamma = \frac{p+q}{2} + 1,\quad p+q=2,3,\dots \end{aligned}$$
and these are also invariant to changes in scale. Algebraic combinations of these moments can provide more attractive features. The most popular are those offered by Hu, which are independent of various transformations. Hu’s original moment invariants (MingKuei 1962; Huang and Leng 2010) are given by:
$$\begin{aligned}M_{1} &= \mu _{20} + \mu _{02}\\M_{2} &= \left( \mu _{20}  \mu _{02}\right) ^2 + 4\mu ^2_{11}\\M_{3} &= \left( \mu _{30}  3\mu _{12}\right) ^2 + 3\left( \mu _{21} + \mu _{03}\right) ^2\\M_{4} &= (\mu _{30} + \mu _{12})^2 + (\mu _{21} + \mu _{03})^2\\M_{5} &= (\mu _{30}  3\mu _{12})(\mu _{30} + \mu _{12})[(\mu _{30} \\&\quad + \mu _{12})^2  3(\mu _{21} + \mu _{03})^2] \\&\quad + 3(\mu _{21}  \mu _{03}[3(\mu _{30} + \mu _{12})^2  (\mu _{21} + \mu _{03})^2] \\M_{6} &= (\mu _{20}  \mu _{02})[(\mu _{30} + \mu _{12})^2 \\&\quad  (\mu _{21} + \mu _{03})^2] + 4\mu _{11}(\mu _{30} \\&\quad + \mu _{12})(\mu _{21} + \mu _{03}) \\M_{7} &= (3\mu _{21}  \mu _{03})(\mu _{30} + \mu _{12}) \\&\quad\times \left[ (\mu _{30} + \mu _{12})^2  3(\mu _{21} + \mu _{03})^2\right] \\&\quad+(\mu _{30}  3\mu _{12})(\mu _{21} + \mu _{03})[3(\mu _{30} + \mu _{12})^2  (\mu _{21} + \mu _{03})^2] \end{aligned}$$
Image matching, between thumbnails and visually protected images, involves calculating the moment distance, \(d\), between the thumbnails and the DC component of the visually protected images. We define the distance as:
$$\begin{aligned} d(a,b) = \sum _{j=1}^{7}M_j^a  M_j^b \end{aligned}$$
(7)
where, \(a\) and \(b\) denote the thumbnail and the DC image, respectively, and \(M\) represents Hu’s moments. The matching process proceeds as follows:

1.
The moments of a thumbnail image are calculated.

2.
DC coefficients from each block of the visually protected JPEG bitstream are extracted to generate the DC image.

3.
The moments of the DC images are calculated.

4.
The moment distances between the query and the DC images are calculated using Eq. (7). The minimum value of \(d(a,b)\) corresponds to image matching.
Key sharing
Once authorization has been requested, a corresponding scramble key is sent to the buyer by the image publisher. The true image content is accessible to the image buyer after proper decoding that includes the unscrambling process using the given key. Various options are available for delivering the scramble key to a buyer. For instance, it could be attached to the system and use the same cloud server or a system built in a different and independent server, or could be accomplished by other online means, such as email.