Texture generation
To cover the objective model, We set up multiple RGB-Depth cameras around it. Then a set of color images and depth data are simultaneously generated by these cameras, which are used to calculate a texture mapping between meshes of the surface and pixels in the color images. With the point cloud of depth data, we first reconstruct the mesh of model’s surface by the method which we proposed in paper (Lai et al. 2015). First we do the registration of the point cloud by the iterative closest point (ICP) algorithm (Besl and Mckay 1992). Then we optimize the noisy raw data by bilateral filtering algorithm (Christian et al. 2012) and finally apply the Poisson surface reconstruction algorithm (Kazhdan et al. 2006) to create the mesh of model’s surface. With the set of color images and the reconstructed mesh of model’s surface, we seek to reconstruct the texture from the images by using a projective map from each triangular mesh to one of the images, as shown in Fig. 1.
Once the color images and depth data are captured by the RGB-Depth cameras, we first register all of them into a global coordinate system, and calculate the 3D point’s position in the global coordinate system. The formula is
$$\begin{aligned} V = (d_n - \delta _d) \cdot q_d / \phi _d \end{aligned}$$
(1)
where d is the location in depth data, q is depth value, \(\delta _d\) and \(\phi _d\) are the center and projection parameters of depth data. Then the coordinate of 3D point in global coordinate system is obtained,
$$\begin{aligned} V' = R \cdot V + T \end{aligned}$$
(2)
where R is a rotation matrix and T is a translation vector. Therefore, the correspondent relationship between 3D points and pixels in the color images is established as follows,
$$\begin{aligned} p = V' \cdot \phi _c / q_d + \delta _c \end{aligned}$$
(3)
where \(\delta _c\) and \(\phi _c\) are the center and projection parameters of color images.
Consequently, the pixel is figured out for each point of the model, and then the texture is generated on the surface by projecting the color image content onto the model.
Images enhancement
Due to the complex environment in the color images, as well as the inaccuracy of depth data, it is an obvious defect that some irrelevant environment contents exist in the final texture. So it is necessary to filter out the irrelevant environment (background), and keep the content of human model (foreground) alone in the color images. On the assumption that there are totally \(P_n\) pixels in color image, we construct 2 Gaussian Mixture Models (GMM) in color space, one for foreground and the other for background. Each one is a full covariance Gaussian mixture with T components (typically \(T = 5\)). Therefore, a vector \(t=\{t_1,\ldots ,t_n\},t_i\in \{1,\ldots ,T\}\), is formed to assign a unique GMM component to each pixel in the color image. For GMM components, we define \(a = 1\) for foreground and \(a = 0\) for background. Then the energy function for segmentation is introduced as follows,
$$\begin{aligned} E = E_d + E_s \end{aligned}$$
(4)
where \(E_d\) is data penalty of pixels assigned to foreground or background, and it depends on the GMM components variables t. So \(E_d\) is defined as follows,
$$\begin{aligned} E_d= & {} \sum D(a_n, t_n, p_n)\nonumber \\= & {} \sum \left( -\log P(p_n | a_n, t_n) - \log \delta (a_n, t_n) \right) \nonumber \\= & {} \sum \left( -\log \delta ( a_n, t_n) + \frac{1}{2} \log det \Sigma ( a_n, t_n)\right. \nonumber \\&\left. + \frac{1}{2} \left[ p_n - \mu (a_n, t_n) \right] ^{T} \Sigma (a_n, t_n)^{-1} \left[ p_n - \mu (a_n,t_n) \right] \right) \end{aligned}$$
(5)
where \(P(\cdot )\) is a Gaussian probability distribution, \(\delta\) is the mixture weighting coefficients, \(\mu\) is the means and \(\Sigma (a_n,t_n)\) is the covariance of the Gaussian components for the foreground and background distributions.
The second term \(E_s\) is smooth energy of discontinuities between adjacent pixels assigned to foreground and background respectively, and calculated by Euclidean distance in the color space.
$$\begin{aligned} E_s = \gamma \sum _{(m,n)\in C}|a_n \ne a_m| exp \left( -\beta \left\| p_m - p_n \right\| ^{2} \right) \end{aligned}$$
(6)
For the energy minimization, we calculate it iteratively until convergence. First we set the whole image as the foreground \((a = 1)\), and initialize the foreground and background GMMs from sets \(a = 1\) and \(a = 0\) respectively. Then we perform the procedure of iterative minimization. Step 1, assign GMM components to pixels, which is straightforward and done by simple enumeration of the \(t_n\) values for each pixel n. Step 2, it is implemented as a set of Gaussian parameter estimation procedures as follows. For a given GMM component t in the foreground model, the subset of pixels \(F(t) = \{p_n : t_n = t , a_n = 1\}\) is defined. The mean \(\mu (a,t)\) and covariance \(\Sigma (a,t)\) are estimated in standard fashion as the sample mean and covariance of pixel values in F(t), and the weights are \(\delta (a,t) = |F(t)| / \Sigma _t|F(t)|\), where |F(t)| denotes the size of F(t). Finally, step 3 is a global optimization, using minimum cut, exactly as this paper (Boykov and Jolly 2001). Repeat from step 1 until convergence.
As a result, the background, irrelevant environment content, is removed out of the color images (Fig. 2).
Seamless optimization
The filtered color images from these cameras, which covers part of model respectively, should be stitched together to create an integrated texture on the model’s surface. Due to the multi-view misregistration and inaccuracy of depth data, it produces some undesirable artifacts in the initial texture, such as misaligned seams at the boundary between different color images, illumination variation between different parts of the texture, and blank regions without any texture contents. These artifacts are small but noticeable, and degrade the visual quality significantly, as shows in Fig. 3.
To eliminate these artifacts, we treat texture projection as an optimization of images stitching. Each triangular mesh of the model’s surface is projected onto the set of color images \(\{I_1,\ldots ,I_N\}\), from which it is visible or not. We seek for the best color image with high contrast, low anisotropy and high resolution as its texture content, and then generate a seamless and smooth texture between adjacent triangular meshes. Actually it results in a Markov Random Field problem.
Markov Random Field model
It is a hybrid combinatorial continuous optimization to search a texture label x for each triangular mesh, which identifies its best associated color image. Then we create the Markov Random Field model as the following form.
$$\begin{aligned} \min _{I_1,\ldots ,I_N}\sum _{i=1}^N D_i(x_i) + \lambda \sum _{\{i,j\}\in M} V_{ij}(x_i,x_j) \end{aligned}$$
(7)
In the first term,
$$\begin{aligned} D_i(x_i) = - \delta _i \int _{\Phi (p)} \left\| \Delta \Phi _i (p) \right\| ^2 dp \end{aligned}$$
(8)
it calculates the data cost of each triangular mesh by projecting them onto each color image, where parameter \(\delta\) is the ratio of triangle perimeter and area, and \(\Phi (\cdot )\) is the projection operator from triangular mesh to color image I. Then we get the best texture label for each triangular mesh, and project the corresponding color image onto the triangular mesh , from which it is visible.
In the second term,
$$\begin{aligned} V_{ij} (x_i,x_j) = \int _{E_{ij}} \left\| \Phi _i(p) - \Phi _j(p) \right\| ^2dp \end{aligned}$$
(9)
the \(E_ij\) means the same edge shared by the two adjacent triangular meshes, so it only calculates the smooth cost of texture discontinuities between the adjacent triangular meshes, and searches the image contents whose details are coherent at the shared edge. If both of the adjacent meshes have the same texture label, the term’s value equals zero. Therefore, the optimization of this term is mainly concentrated at the adjacent triangular meshes with different texture labels.
In this Markov Random Field model, each node denotes a triangular mesh with a set of optional texture labels, and the solution of the optimization substantially assigns a best texture label, namely color image, to each triangular mesh of the surface.
Optimization with adaptive iterative factor
To solve the optimization, we use the method based on graph cut, named \(\alpha\)-expansion, which refers to the min-cut/max-flow algorithms of graph theory into the combinatorial optimization. The minimal cut corresponds to the global minimization of the energy function, which assigns a best label to each triangular mesh within a global minimum. Many other standard algorithms, such as iterative conditional model (ICM) and simulated annealing (SA), apply small moves which just change one node’s label at one time. In contrast, the \(\alpha\)-expansion algorithm is allowed to change the labels of a large number of nodes simultaneously, aiming to obtain a smaller energy. Therefore, within such large moves, it is effective to get rid of the local minimum and rapidly converge to the global minimum.
The \(\alpha\)-expansion algorithm is summarized as follows. Given an arbitrary initial labeling f and a label \(\alpha\), a weighted edge is formed between neighboring nodes, and the goal is to find a new labeling \(f'\) which minimizes the energy over all the nodes’ labels through one \(\alpha\)-expansion move. During the \(\alpha\)-expansion move, it just changes the nodes whose label is not \(\alpha\), and the other nodes whose label is \(\alpha\) maintain the same value. If the energy of new labeling \(f'\) is less than the original, it continues to perform another one \(\alpha\)-expansion move; otherwise, returns the current labeling \(f'\) with the minimal energy. Therefore, the returned labeling \(f'\) is the final solution of the energy function, which assigns a best label to each triangular mesh.
However, the \(\alpha\)-expansion algorithm can not eliminate the misalignment seams at the boundary between different color images, caused by multiple-view misregistration and inaccuracy of depth data. To minimize these artifacts, we introduce an adaptive iterative factor, a translation coordinate of the color image, to expand the texture label x as the form of \(x=(I_i,t)\), where \(I_i\in \{I_1,\ldots ,I_N\}\). Then, during the \(\alpha\)-expansion move, we can move the pixels in the color space of images adaptively and iteratively, to match the color content precisely at the edge shared by adjacent triangular meshes. As a result, the misalignment seams would be removed significantly. So the second term becomes this form
$$\begin{aligned} V_{ij} (x_i,x_j)= & {} \int _{E_{ij}} \left\| \Phi _i(p)\omega (t_i) - \Phi _j(p)\omega (t_j) \right\| ^2dp \end{aligned}$$
(10)
$$\omega (t)= \hat{\omega } (t - \Delta T) + e(t)$$
(11)
where \(\omega (t)\) is the iterative function of the translation coordinate, and e(t) is the unit coordinate vector.
In the optimization of seams, we treat each color image as an individual image space. After each triangular mesh of the model’s face is projected on the color image, the location of the mesh on the color image is obtained. Then, with the adaptive iterative factor, a 2D translation vector, the location of mesh is changed along the possible directions. Finally, we get the best location by comparing the color content of two adjacent triangular meshes at the shared edge. In the initialization of the optimization, every triangular mesh is assigned a set of possible labels, that means the mesh can be projected onto these color images, and the adaptive factor is set to be zero. Then we calculate the energy function iteratively to find the optimal labels, and move the texture image gradually, with a threshold of 32 pixel units, to match the color content at the shared edge of adjacent meshes. Therefore, the misaligned seams, caused by the inaccuracy of depth data and multi-view misregistration, are eliminated. In each cycle of iteration, each node is assigned the current best label by energy optimization. Furthermore, to confirm the assigned label x being the best label for node i, we fix the neighbor nodes with their assigned best label, and calculate the sum of energy of \(\sum V_{ij}(x_i^*,x_j)\) for all the possible labels of node i. If another label \(x'\) achieves the minimal energy, we abandon the assigned label x and choose the label \(x'\) as the best label for node i. As the iterative strategy is applied to the energy optimization, the result of current iteration is further fed into next iteration as candidate labels, until a best labeling \(f'\) achieves the global minimum of the energy optimization over all the nodes’ labels. Finally, the optimized texture without misalignment seams is obtained, as Fig. 4 shows.
Color blending
Although these misaligned seams can be eliminated effectively by the optimization of MRF, one artifact of illumination variation in different images still remains, caused by the different ray gains of RGB-Depth cameras from respective positions, as well as the reflection of surface. This small but noticeable artifact significantly degrades the texture and visual effect.
To resolve this artifact, we deal with color image as sources of color gradients rather than sources of color, and construct a composite vector field. Then we figure out the fused color of multiple images whose gradient best matches the vector field, and thus the texture between different parts of surface becomes coherent in illumination.
As is well known, the color image consists of RGB three channels, so the process of color blending is performed separately in these three channels. In this case, given a single color channel, we refer to the color space of images as \(D(D \in \mathfrak {R}^2)\), the color gradient as g, the composite vector field as V, and the boundary between adjacent color images as b, where the pixels keep fixed as constraints for color blending. Accordingly, the objective function is
$$\begin{aligned} \min _g \iint _{D} \left| \nabla f - \vec {V} \right| ^2,\quad f|_{\partial D} = g|_{\partial D} \end{aligned}$$
(12)
where gradient operator is \(\nabla = [\frac{\partial }{\partial x}, \frac{\partial }{\partial y}]\).
The minimization of Eq. (12) could be transformed to the corresponding Poisson equation with Dirichlet boundary condition,
$$\begin{aligned} \triangle f = div\vec {V}, \quad f|_{\partial D} = g|_{\partial D} \end{aligned}$$
(13)
where the Laplace operator \(\triangle = \frac{\partial ^2}{\partial x^2} + \frac{\partial ^2}{\partial y^2}\), and the divergence operator of the vector field \(\vec {V} = (u,v)\) is \(div\vec {V} = \frac{\partial u}{\partial x} + \frac{\partial v}{\partial y}\).
In the color space of images, the pixels are discretely distributed with the value I(u, v). Then the divergence of vector field \(\vec {V}\) specifies two linear equations, each involves two variables,
$$\begin{aligned} I(u+1,v) - I(u,v)&= I_x (u,v)\nonumber \\ I(u,v+1) - I(u,v)&= I_y (u,v) \end{aligned}$$
(14)
which leads to a discrete Poisson equation. Besides, the fixed pixels in the boundary between adjacent images are added into the linear system as constraints. Consequently, the resulting linear system of Poisson equation is over-constrained. We use the method of multi-grid (Fattal et al. 2002) to solve the equation, and achieve the optimal gradient which best matches this vector field. As a result, each pixel in color images is assigned the best value. Then the color of different images becomes coherent without illumination variation (Fig. 5).
Blank region repair
After the illumination variations are handled effectively by the process of color blending, last remaining artifact is blank regions, which is caused by inaccuracy of depth data and multi-view registration. As there is no corresponding relationship between the meshes in these areas and color images, some blank regions without any texture content exist on the surface of model, which destroy the integrity of texture.
With the purpose of repairing these blank regions, we need to fill up these regions with color contents which are smoothly blending with neighboring regions, and generate an integrated and coherent texture on the model’s surface. In this paper, we apply the library of CGAL to parameterize the 3D textured surface (Lévy et al. 2002; Desbrun et al. 2002), and project it onto a 2D plane. Then we use the K-Nearest Neighbor algorithm to search for K neighboring points with color contents, and then create the texture image for blank regions with these color contents, shown in Fig. 6.
As a result,we project it onto the the model’s surface to fill up the blank regions, and then the entire surface of the model is textured. The process is interpreted as follows.
-
(1)
Search for K colored neighboring points of the blank point in the projected plane. While the number of colored neighboring points is less than K, the range of search extends to neighbors of these colored points until the number is equal or greater than K.
-
(2)
Calculate the distance respectively from the K colored points to the blank point, and refer to the maximum among them as Dmax. Given another colored point with distance D, compare it with colored point of \(D_{max}\). If D is larger than \(D_{max}\), then abandon the former; otherwise, replace the latter with the former of value D, and then update the maximum distance among the new K colored points.
-
(3)
Repeat step 1 and 2, until the K colored points no longer update.
Then, a texture image for blank regions ia generated and back projected onto the model’s surface. Consequently all blank regions on the surface of model become textured with color contents, and an integrated and coherent texture is generated on the surface of model as Fig. 7 shows.