Iris classification based on sparse representations using on-line dictionary learning for large-scale de-duplication applications

De-duplication of biometrics is not scalable when the number of people to be enrolled into the biometric system runs into billions, while creating a unique identity for every person. In this paper, we propose an iris classification based on sparse representation of log-gabor wavelet features using on-line dictionary learning (ODL) for large-scale de-duplication applications. Three different iris classes based on iris fiber structures, namely, stream, flower, jewel and shaker, are used for faster retrieval of identities. Also, an iris adjudication process is illustrated by comparing the matched iris-pair images side-by-side to make the decision on the identification score using color coding. Iris classification and adjudication are included in iris de-duplication architecture to speed-up the identification process and to reduce the identification errors. The efficacy of the proposed classification approach is demonstrated on the standard iris database, UPOL.


Introduction
Various government sectors in the world provide welfare services like NREGS (national rural employment guarantee system), TPDS (targeted public distribution system), old age pensions, health insurance etc... for the benefit of the people. A unique identity (UID) number creation for every person removes the requirement of producing mutliple documentary proofs for availing the services. De-duplication of biometrics plays a key role in providing unique identity of a person. De-duplication means the elimination of duplicate enrollments of the same person using the biometric data. As the number of people enrolled into the biometric system runs into billions, the time complexity increases in the de-duplication process while creating a unique identity for every individual. There is a need for de-duplication architecture based on biometrics which are scalable in large-scale databases. Among all the biometrics, fingerprints and iris give more accurate results in uniquely identifying the people based on minutia features. The biometric recognition system allows few errors in the identification process. In order to reduce the errors, fingerprint experts look for possible fingerprint matches and enhance the fingerprints to compare the minutia features manually using fingerprint adjudication process. Fingerprint adjudication means, comparison of two fingerprints side-by-side to analyze the matched minutia features. Even though the iris biometric is more accurate than the fingerprints, there is a need for iris adjudication process to reduce the identification errors.
The complex iris texture provides the uniqueness for iris images. Daugman proposed an iris recognition system by using gabor filters and iris codes (Daugman 1993(Daugman , 2001(Daugman , 2003(Daugman , 2004. Wildes 1997 has implemented a gradient iris segmentation using Laplacian pyramid construction. Few researchers already explored iris classification techniques based on hierarchical visual codebook (Sun et al. 2014), block-wise texture analysis (Ross and Sunder 2010) and color information (Zhang et al. 2012). There are no approaches for classification of iris images based on the pre-defined iris classes in the existing work. In this paper, we propose an iris classification based on sparse representation of log-gabor wavelet features using on-line dictionary learning (ODL). Three different iris classes based on iris fiber structures, namely, stream, flower, jewel and shaker, are used for faster retrieval of identities in large-scale de-duplication applications. Also, an iris adjudication process is proposed by comparing the matched iris-pair images side-by-side to make the decision on the identification score using color coding. The iris classification and adjudication framework is used to speed-up the identification process and to reduce the identification errors in iris de-duplication architecture.  The rest of the paper is organized as follows: Section 2, gives the details of sparse representation and on-line dictionary learning. Section 3, gives the motivation for the proposed iris classification approach by illustrating the complexity involved in de-duplication of large scale iris databases. In Section 4, the proposed iris classification and adjudication framework is presented. Experimental results of the proposed classification and adjudication framework are given in Section 5. Conclusions are explained in Section 6.

Sparse representation and on-line dictionary learning (ODL)
Sparse representation has received a lot of attention from researchers in signal and image processing. Sparse coding involves the representation of an image as a linear combination of some atoms in a dictionary (Ramirez et al. 2010). Several algorithms like on-line dictionary learning (ODL) (Mairal et al. 2009), K-SVD (Aharon et al. 2006) and method of optimal directions (MOD) (Engan et al. 1999) have been developed to process training data. Sparse representation is used to match the input query image with the appropriate class. Etemand and Chellappa (Etemad and Chellappa 1998) proposed a feature extraction method for classification using wavelet packets. In (Sprechmann and Sapiro 2010), a method presented for the learning of dictionaries simultaneously. Recently, similar algorithms for simultaneous sparse signal representation have also been proposed (Huang and Aviyente 2006;Rodriguez and Sapiro2008).
The online dictionary learning algorithm alternates between sparse coding and dictionary update steps. Several efficient pursuit algorithms have been proposed in the literature for sparse coding (Engan et al. 1999;Mallat and Zhang 1993). The simplest one is the l 1 -lasso algorithm (Lee et al. 2007). Main advantage with ODL algorithm is its computational speed as it uses l 1 -lasso algorithm for sparse representation.

De-duplication architecture
De-duplication means the elimination of duplicate enrollments of the same person using the biometric data. During de-duplication process, matching the biometrics of a person is done against the biometrics of other persons to ensure that the same person is not enrolled more than once.

Motivation behind this work
The state government of Andhrapradesh (Government of Andhra Pradesh, civil supplies department 2015) in India undertake the responsibility to identify the eligible households/beneficiaries and issue a ration card which enables them to avail the prescribed quantity of food grains and/or other commodities. The de-duplication was carried out for the ration cards using 52 million people iris codes to reduce the misuse of government subsidy. There are over 6.26 quadrillion (6,262,668,889,152,840) iris matches performed in de-centralized manner to remove duplicate enrollments in 61 days with high-end blade servers equipment which is not a scalable solution. This is the  motivation for the proposed classification approach which reduces the search time drastically and provide the scalable de-duplication solutions.
The proposed de-duplication architecture is shown in Figure 1. In the processing stage, an iris image is segmented and normalized. Then iris templates are extracted using log-gabor wavelets. The de-duplication engine or iris matcher improves the speed of de-duplication by adding multiple blade servers. All the enrolled iris templates are loaded into each blade server and the iris templates are compared in "1 : all" manner in N blade servers simultaneously. For example, if there are N query iris templates to be processed, then each query iris template goes to a blade server for de-duplication. If there are more than N query images, the delta of the iris templates keep on waiting in a queue till any of the blade servers are free. Increasing the blade servers is not an optimal solution, especially in large-scale iris databases. There should be another layer for iris classification to reduce the search space in the de-duplication engine. So, we propose an iris classification based on sparse representation of loggabor wavelet features using on-line dictionary learning (ODL). Also, an iris adjudication process is done by comparing the matched iris-pair images side-by-side to know the confidence-level on the matching score based on color coding.

Proposed iris classification and adjudication framework
The proposed iris classification approach uses three different classes of iris images (Unitree foundation the rayid model of iris interpretation 2009) namely, stream, flower, and jewel-shaker as illustrated in Figure 2. The iris structure can be determined by the arrangement of white fibers radiating from the pupil. In stream iris structure, these fibers are arranged in regular and uniform fashion. The arrangement of fibers is irregular in the flower iris structure. In jewel iris structure, the fibers have some dots. The shaker iris structure have both the characteristics of flower and jewel iris structures. The arrangement of fibers are illustrated in Figure 3.
The following are the steps involved in the proposed iris classification and adjudication framework: Step 1. Iris segmentation and normalization : The pupillary and limbic boundaries (Masek 2003) of an iris image are approximated as circles using three parameters: the radius r, and the coordinates of the center of the circle, x 0 and y 0 . The integrodifferential operator (Daugman 1993) used for iris segmentation is:  where G σ (r) is a smoothing function and I(x, y) is the image of the eye. After applying the operator, the resultant segmented iris image is as shown in Figure 4. The segmented iris image is then converted to a dimensionless polar coordinate system based on the Daugman Rubber Sheet model (Daugman 1993) as shown in Figure 5.
Step 2. Feature extraction (Masek 2003): The log-gabor wavelet feature vector of size 720 × 40 is extracted from the normalized iris image of size 360 × 40. The resultant feature vector is converted to a single column vector by column major ordering. From each class, some of the iris images are selected to express as a linear weighted sum of the feature vectors in a dictionary belonging to three different classes of iris.
Step 3. Iris classification using ODL: An on-line dictionary learning (ODL) algorithm is used to classify the iris data into three different classes to reduce the search space. The weights associated with feature vectors in the dictionary are evaluated using ODL algorithm, which is a solution to l 1 optimization for over-determined system of equations. The feature vectors which belong to a particular iris class carry significant weights which are non-zero maximum values.
The class C =[ C 1 , . . . , C N ] consists of training samples collected directly from the image of interest. In the proposed sparsity model, images belonging to the same class are assumed to lie approximately in a low dimensional subspace. Given N training classes, the p th class has K p training images {y N i } i=1,. . . , K p . Let b be an image belonging to the p th class, and it is represented as a linear combination of these training samples: where D p is a dictionary of size m × K p , whose columns are the training samples in the p th class and p is a sparse vector.  The following are the steps involved in the proposed classification method: 1. Dictionary Construction: Construct the dictionary for each class of training images using on-line dictionary learning algorithm (Mairal et al. 2009). Then, the dictionaries D =[ D 1 , . . . , D N ] are computed using the equation: ( 3 ) where δ i is a characteristic function that selects the coefficients. Then b j is assigned to C i associated with the i th dictionary. It means, finding the sparsest dictionary for a given test data using l 1 -lasso algorithm. Then, test data is assigned to the class associated with this sparsest dictionary.
Step 4. Iris Adjudication: The matched iris pairs are compared using the adjudication process to illustrate the match-ability of iris images based on the similarity of iris regions marked with three different colors, namely, green, yellow and red. The green, yellow and red colors indicate good, poor and bad match, respectively. The normalized iris image is divided into different regions and the confidence-level of matching for each region is verifed and assigned a color code using the dissimilarity measurement.

Experimental results
To enable the effective test of the proposed classification strategy, three standard iris image databases are used, namely, CASIA1 database (Casia-irisv1 chinese academy of sciencesinstitute of automation iris database 2015), IITD iris database (Kumar andPassi 2010, 2015), and UPOL iris database (Dobe and Machala 2004;Dobes et al. 2006Dobes et al. , 2004.

Figure 7
Experimental results for all the proposed classification approaches on UPOL iris database. Class-3 (Jewel-Shaker) 100 100 100 Boldface data represents the best performance.
The CASIA database consists of 108 subjects, three instances of left iris and four instances of right iris are collected from each subject. So there is a total of 756 iris images in the database, all are having the image dimensions 320 × 280 gray-scale images. For testing, 216 iris images are used and the remaining iris images are used for training.
The IITD iris database consists of 224 subjects iris data, both left and right iris images. For each subject there are 10 instances of each iris image. So there is a total of 2232 iris images in the database, all are having the image dimensions 320 x 280 gray-scale images.
The UPOL iris data is collected from 64 subjects, with three samples of left and right eyes from each subject resulting in a total of 384 iris images. Each iris image is of 24 bit RGB color space with a high resolution image size, 768x576. The images were captured using the optical device (TOPCON TRC50IA) which is connected to a Sony DXC-950p 3CCD camera.

SVM-4Class-PCA-Kmeans classification approach:
This classification approach uses the support vector Class-3 (Jewel-Shaker) 100 100 100 Boldface data represents the best performance. Class-3 (Jewel-Shaker) 100 100 100 Boldface data represents the best performance. machine (SVM) as a classifier. The classes are defined by applying the k-means clustering on the iris feature vectors whose dimensions are reduced to 100 features by considering the 100 principle components using principle component analysis (PCA). The correlation similarity measure is used for clustering the iris data into four different iris categories. This approach is applied on three standard iris databases, where 2/3 of the each database is used for training and remaining data is used for testing.

ODL-4Class-PCA-Kmeans classification approach
In this classification approach, the sparsity-based on-line dictionary learning (ODL) is used as a classifier. The kmeans clustering is applied to define the classes on the iris feature vectors whose dimensions are reduced to 100 features by considering the 100 principle components using PCA. The correlation similarity measure is used for clustering the iris data into four different iris categories. This approach is applied on three standard iris databases, where 2/3 of the each database is used for training and remaining data is used for testing.

SVM-3Class-IrisFibers classification approach
This classification approach uses SVM as a classifier. The classes are defined by manual labeling of three iris categories (Unitree foundation the rayid model of iris interpretation 2009) using the iris fiber structures. This approach is applied on UPOL standard iris database, where 2/3 of the database is used for training and remaining data is used for testing.

Figure 8
Classification accuracy for three different dictionary sizes 60, 90 and 120.
in this iris classification approach. The proposed iris de-duplication architecture include this classification to reduce the search space. The classes are defined by manual labeling of three iris categories using the iris fiber structures. This approach is applied on UPOL standard iris database, where 2/3 of the database is used for training and remaining data is used for testing.

Experiment-1
In iris classification approaches 1 and 2, the experiments are conducted using the three databases, namely, CASIA1, IITD and UPOL iris dabases with template sizes 480 by 20. Four classes are identified using k-means clustering algorithm using the correlation-based distance metric. Table 1 describes the details of the number of images in each class and in three different databases.
The experimental results are illustrated as shown in Figure 6. It is observed that the ODL-4Class-PCA-Kmeans classification approach gives better classification performance due to the effectiveness of sparsity.

Experiment-2
In iris classification approaches 3 and 4, the experiments are conducted using the UPOL iris database with template sizes 720 by 40. Three classes are manually identified in these proposed iris classification approaches using the iris patterns stream, flower and jewel-shaker as shown in Table 2. In this experiment, the other two databases are excluded as it was difficult to mark the class labels due to the less clarity to manually identify the iris fiber structures.
The experimental results for the UPOL database are compared using SVM and ODL and illustrated as shown in Figure 7. It is observed that the classification accuracy is better in the ODL-related classification approaches.

Detailed analysis on the proposed classification approach : ODL-3Class-IrisFibers
In order to evaluate the performance of proposed classification approach using on-line dictionary learning, the database is split into three sets: training set, testing set and validation set. The distribution of all the three sets are taken in such a way that the 2 samples of each iris image is allotted to the training set and validation set, and the remaining iris sample is given to the test set. The training set consists of 224 images where 112 images are from Class-1 (Stream), 60 images are from Class-2 (Flower) and 52 images are from Class-3 (Jewel-Shaker). The number of test images selected from Class-1, Class-2 and Class-3 are 64, 34 and 30, respectively. A set of 32 iris images is assigned to validation set where 16 images belong to Class-1, 8 images belong to Class-2 and 8 images belong to Class-3.  The experiments were conducted in three different ways of choosing test sets (systematically selecting first, second or third samples of each iris) where the performance is almost similar. The classification performance is shown for the test data set with different dictionary sizes 60, 90 and 120, in Tables 3, 4 and 5, respectively.
In Table 6, the classification accuracy for the validation data set is given. It is observed that 100% classification accuracy is achieved for the dictionary sizes, 90 and 120 with residual error value 0.05 as shown in Figure 8. The confusion matrix for both test data and validation data sets are shown in Table 7.
The adjudication results for genuine iris matches are illustrated in Figure 9 and for the impostor iris matches are given in Figure 10. The normalized images shown on these figures are taken from CASIA database for better illustration of adjudication process.

Conclusion
In this paper, an iris classification is proposed based on sparse representation of log-gabor wavelet features using on-line dictionary learning (ODL) for large-scale de-duplication applications. Three different iris classes based on iris fiber structures, namely, stream, flower, jewel and shaker, are used for faster retrieval of identities. Also, an iris adjudication process is illustrated by comparing the matched iris-pair images side-byside to make the decision on the identification score using color coding. The efficacy of the proposed classification approach is demonstrated on the standard iris database, UPOL, and it is achieved 100% classification accuracy with dictionary size 90 and residual error 0.05. The proposed iris de-duplication architecture improves the speed of identification process and reduces the identification errors in large-scale de-duplication applications.