ABSTRACT : Gait recognition is an important biometric technique for video surveillance tasks, due to the advantage of using it at distance. In this paper, we present a persistent homology-based method to extract topological features from the body silhouettes of a gait sequence. It has been used before in several papers for the second author for human identification, gender classification, carried object detection and monitoring human activities at distance. As the previous work, we apply persistent homology to extract topological features from the lowest fourth part of the body silhouette to decrease the negative effects of variations unrelated to the gait in the upper body part. The novelty of this paper is the introduction of the use of a metric learning to learn a Mahalanobis distance metric to robust gait recognition, where we use Linear Discriminant Analysis. This learned metric enforces objects for the same class to be closer while objects from different classes are pulled apart. We evaluate our approach using the CASIA-B dataset and we show the effectiveness of the methods proposed compared with other state-of-the-art methods.
Key words: TDATDA,gait recognitiongait recognition,persistent homologypersistent homology,linear discriminant analysislinear discriminant analysis,metric learning.metric learning..
RESUMEN: El reconocimiento del andar es una técnica biométrica importante para las tareas de videovigilancia, debido a la ventaja de su uso a grandes distancias. En este artículo, presentamos un método basado en la homología persistente para extraer características topológicas de las siluetas de una secuencia del andar. Esta metodología ha sido utilizada anteriormente en varios artículos por el segundo autor para la identificación de personas por la forma de caminar, clasificación de género, detección de objetos que transporta la persona y el monitoreo de actividades humanas a una distancia determinada. Como en los trabajos anteriores, aplicamos la homología persistente para extraer las características topológicas de la cuarta parte inferior de la silueta del cuerpo humano con el objetivo de disminuir los efectos negativos de las variaciones no relacionadas con el andar en la parte superior del cuerpo. La novedad de este trabajo es la introducción del uso de un aprendizaje de métrica para el reconocimiento robusto del andar, donde se utiliza la técnica Análisis Discriminante lineal(LDA). Esta métrica aprendida obliga a que los objetos de la misma clase estén más cerca, mientras que los objetos de diferentes clases se separan. Evaluamos nuestro enfoque utilizando la base de datos CASIA-B y mostramos la efectividad de los métodos propuestos en comparación con el estado del arte.
Palabras clave: TDA, reconocimiento de la marcha, homología persistente, análisis de discriminante lineal, aprendizaje de métrica.
Artículo Original
Metric Learning to improve the persistent homology-based gait recognition
Aprendizaje de métrica para mejorar el reconocimiento del andar basado en la homología persistente
Received: 30 November 2017
Accepted: 16 May 2018
Gait is a behavioral biometric which has advantages over other biometrics techniques because it is available even when the subject is at a distance from a camera because it can be recognized from a relatively low-resolution image sequence Mori et al. (2010) and the gait features can be obtained without subject cooperation. Because of these advantages, gait recognition is suitable for many applications such as surveillance, forensics, and criminal investigation. Currently, there are good results in the state of the art for persons walking under natural conditions Yu et al. (2006), Lee et al. (2014), Lamar-Leon et al. (2017). However, it is not common for people to walk without carrying a bag, wearing a coat or anything that changes the natural gait. The most successful approaches in gait recognition use silhouettes-based technique to get the features and the best results have been obtained from the methods based in Gait Energy Images (GEI) Yu et al. (2006), Lee et al. (2014), Rida et al. (2016),Wu et al. (2017). The GEI methods have been used to eliminate the effects of carrying a bag or wearing a coat. Moreover, these methods are highly correlated with errors frequently appear in the existing algorithms for background segmentation. This implies that GEI methods are influenced by the shape of the silhouette instead of the relative positions among the parts of the body while walking.
In this work, the gait was modeled using a persistent-homology-based representation (called topological signature of the gait sequence) Lamar-Leon et al. (2017), since it gives features of the objects that are invariant to deformation. We start the procedure with a sequence of silhouettes obtained from a video. A simplicial complex ∂K(I) which represents the human gait is then constructed (see Section). Sixteen persistence barcodes are then computed (see Section) considering, respectively, the distance to eight fixed planes (2 horizontals, 2 verticals, 2 oblique and 2 depth planes) in order to completely capture the movement in the gait sequence. To decrease the negative effects of variations unrelated to the gait in the upper body part, we only select the lowest fourth part of the body silhouette (legs-silhouette), (see in Section).
Some researchers have shown that the classification can be further improved by metric learning methods, which are important for many practical applications, such as image retrieval Lee et al. (2008), Mensink et al. (2013), Gao et al. (2014), face verification Lu et al. (2015), Huang et al. (2015),Koestinger et al. (2012), and person identification Chen et al. (2015), Liao et al. (2015), Liao and Li (2015). Similarity metric learning aims to learn an appropriate distance or similarity measure to compare pairs of example. This provide a natural solution for the verification task. The most popular way to gain robustness to covariates is to incorporate spatial metric learning based in Mahalanobis distance. However, difficult to cover all the variations only by spatial metric learning because the topological features are affected by covariates such as clothing and carrying bag.
In this work we considered the Linear Discriminant Analysis(LDA) Hastie and Tibshirani (1996) to separate pairs of the same subjects and different subjects well in a data-driven way (see in Section). In the context of metric learning, LDA computes a linear projection L that maximizes the amount of between-class variance relative to the amount of within-class variance. Experiments were conducted on CASIA-B gait database and the results in Section demonstrate the improvement of gait recognition performance via the combination of topological features and metric learning.
In recent years, various techniques have been proposed to solve the gait recognition problem. The appearancebased approaches directly use input or silhouette images in a holistic way to extract gait features without model fitting, and hence they generally work well. In particular, silhouette based representation such as gait energy images(GEI)Man and Bhanu (2006), frequency domain features (FDFs)Makihara et al. (2006), chrono gait images Tao et al. (2007), and Gabor GEIsWang et al. (2012), are dominant in the gait recognition community because of their sample yet effective properties. The appearance-based approaches, however, often suffer from large intra-subject appearance changes due to covariates such as clothing, carrying status, view and walking speed. Model-based methods attempt to explicitly model the human body or motion by employing the static and dynamic body parameters to execute model matching in each frame of a walking sequence. They require a relatively high image resolution to get reasonable human model fitting results and incur high computational costs Zhao et al. (2006),Ariyanto and Nixon (2011),Bodor et al. (2009).
Distance metric learning
Euclidean distance is usually used for simplicity, however, it has serious effects on the performance in classification, clustering and retrieval task. Many machine learning methods heavily rely on the selected distance metric, which measures how similar two samples are. Therefore, distance metric leaning has attracted great interest. Most of the distance metric learning algorithms explored the pairwise constrains between training samples to keep the samples of the same class close and the samples from different classes apart.
The distance metrics are based on multivariate data distributions, such as Mahalanobis distance. The most classical algorithm is Xing’s method Xing et al. (2003) which formulated distance metric learning as a constrained convex programming problem. Du and Zhang (2014) proposed a Mahalanobis distance based metric learning based on a gradient descent solver with an alternative updating strategy for the purpose of maximizing the inter-class distance, and at the same time minimizing the intra-class distance. Another important family is embedding method, i.e to transform the data set from the original space into its subspace. The most popular way to gain robustness to covariates is to incorporate spatial metric learning such as Linear Discriminant Analysis (LDA) Hastie and Tibshirani (1996), general tensor discriminant analysis (GTDA) Tao et al. (2006), discriminant analysis with tensor representation (DATER) Xu et al. (2006), the random subspace method (RSM) Guan et al. (2012). Zhang et al. (2009) proposed a patch alignment framework to unify PCA, LDA, LPP, NPE and so on, and this work plays an important role in better understanding the intrinsic difference of these manifold leaning based dimension reduction algorithms. Gui et al. (2010) added discriminant information into LPP. Gong et al. (2015) proposed deformed graph Laplacian and signed Laplacian embedding for semi-supervised learning. In contrast to previous metric learning approaches, Gui et al. (2012) proposed a discriminant sparse NPE and later Yang et al. (2015) gave a collaborative representation based on L2 norm graph. In order to improve the discrimination power, Zhou et al. (2011) proposed Simultaneous Discriminant Analysis (SDA) to gather the LR and HR images from the same class and simultaneously separate different classes. Makihara et al. introduce a metric on joint intensity to mitigate the large intra-subject differences and leverage the subtle inter-subject differences. They learn such a metric so as to separate pairs of the same subjects and different subjects well in a data driven way.
In this section we introduce the construction of the simplicial complexes ∂K(I) which represents the input human gait sequence. We start the procedure with sequence of silhouettes obtained from a gait sequence. With the intention of a fair comparison, we get the sequences from the background segmentation provided in CASIA-B dataset. Figure 1

As we did in previous paper Lamar-León et al. (2012), Leon et al. (2013) Lamar-Leon et al. (2014), Lamar-Leon et al. (2016), for obtaining the simplicial complex from a gait, we first build a 3D binary image I = (Z3,B) by stacking k consecutive silhouettes, where B ⊂ Z3 is the foreground and Bc = Z3 − B is the background, respectively of a subsequence of representation is built stacking silhouettes aligned by their gravity centers (gc). Later, I is used to derive a cubical complex Q(I). The cubical complex is a combinatorial structure constituted by a set of unit cubes with faces parallel to the coordinate planes and vertices in Z3, together with all its faces. The 0−faces of a cube c are its 8 corners (vertices), its 1−faces are its 12 edges, its 2−faces are its 6 squares and, finally, its 3−faces is the cube itself. Then, a cube with vertices V = {(i,j,k),(i+1,j,k),(i,j+ 1,k),(i,j,k+1),(i+1,j +1,k),(i+1,j,k+1),(i,j +1,k+1),(i+1,j +1,k+1)}, with (i,j,k) ∈ Z3, is added to Q(I) together with all its faces if and only V ⊆ B. The simplicial representation ∂K(I) of I is obtained from Q(I) by subdividing each square of Q(I) in 2 triangles together with all their faces (vertices and edges). Finally, coordinates of the vertices of ∂K(I) are normalized to coordinates (x,y,t), where 0 ≤ x,y ≤ 1 and t is the number of silhouette of the sub-sequence of representation.
To decrease the negative effects of variations unrelated to the gait in the upper body start (related, for example, to hand gestures like talking on cell), we selected the lowest fourth part of the body silhouette (legs-silhouette). This selection is endorsed by the result given in Bashir et al. (2010) , which shows that this part of the body provides most of the necessary information for classification. We start the procedure with a sequence of silhouettes obtained from a video. With the intention of a fair comparison, we get the sequences from the background segmentation provided in CASIA-B dataset. Figure 2

Filtration of the Simplicial Complex
The next step in this process is to sort the simplices of ∂K(I) in order to obtain a filtration, which is a partial ordering of the simplices of ∂K(I) dictated by a filter function f: ∂K(I) → R, satisfying that if a simplex σ is a face of another simplex σ0 in ∂K(I) then f(σ) ≤ f(σ0 ) (i.e., σ appears before or at the same time thatσ0 in the ordering). Figura 3

In this work, we use eight filtrations obtained from eight planes. For each plane π, it defines the filter function fπ: ∂K(I) → R which assigns to each vector vertex of ∂K(I) its distance to the plane π, and to any other simplex of ∂K(I), the biggest distance of its vertices to π. Ordering the simplices of ∂K(I) according to the values of fπ, we obtain the filtration ∂Kπ for ∂K(I) associated to the plane π.
Observe that, the filtration associated to each plane is obtained in a different way: By adding one simplex at each time (i.e., a total ordering of the simplices is constructed). Nevertheless, the filtration presented in Lamar-Leon et al. (2016) and in this paper, is constructed by adding a bunch of simplices with possible different cardinalities, which makes the method robust to variation in the amount of simplices of the simplicial complex and therefore, robust to noise.
The topological signature of a gait sequence is obtained by the compute the persistent homology of each filtration. Persistent homology is an algebraic tool for measuring topological features of shapes and functions. It is built on top of homology, which is a topological invariant that captures the amount of connected components (0−cycles), tunnels (1−cycles), cavities (2−cycles) and similar in higher dimensions of a shape. Small size features in persistent homology are often categorized as noise, while large size features describe topological properties of shapes Edelsbrunner and Harer (2010).
Formally, let K be a simplicial complex. A p−chain is denoted by a Cp(K). Let us define the homomorphism: ∂p: Cp(K) → Cp−1(K) called boundary operator such that for each p−simplex σ of K, ∂p(σ) is the sum of its faces. For example, if σ is a triangle, ∂2(σ) is the sum of its edges. The kernel of ∂p+1 is called the group of p−cycles in Cp(K) and the image of ∂p+1 is called the group of p−boundaries in Cp(K). The p−homology Hp(K) of K is the quotient group of p−cycles relative to p−boundaries. Then, 0−homology classes represents the connected components of K, 1−homology classes its tunnels and 2−homology classes its cavities.
To explain the concept of persistent homology, consider a filtration (i.e., a list of sorted simplices) Kp = (σ1,σ2,··· ,σm) for a simplicial complex K obtained from a given filter function fp: K → R. Suppose that the simplices of the filtration are added in order (i.e., exactly one simplex is added each time). If σi completes a q−cycle (q is the dimension of σi) when σi is added to Ki−1 = (σ1,··· ,σi−1), then a q−homology class γ is born at time fp(σi); otherwise, a (q−1)−homology class dies at time fp(σi). The differences between the birth and death time of a homology class is called its persistence, which quantifies the significance of a topological attribute. If γ never dies, we set its persistence to infinity. For a q−homology class that is born at time fp(σi) and dies at time fp(σj), we draw a segment with endpoints fp(σi) and fp(σj) to get the q−persistence barcode of the filtration.
Now, the topological signature is computed from the persistence barcodes obtained for ∂Kπ for each plane π.
Observe that fixed a reference plane π, the length of each interval in the persistence barcode obtained for ∂Kπ is: a) less or equal than 1 if π is a horizontal or vertical plane, and b) less or equal than √2 if π is an oblique plane. Now for computing the topological signature, for each plane π, the 0−persistence barcode (i.e., the lifetime of connected components) and the 1−persistence barcode (i.e., the lifetime of tunnels) of the filtration ∂Kpi are explored according to a uniform sampling. More precisely, given a positive integer n (being n = 24 in our experimental results, obtained by cross validation), we computer the integer
which represents the width of the ”window” we use to analyze the persistence barcode, being k the biggest distance of a vertex in ∂K(I) to the given plane π.
For example, let us suppose an scenario in which m j-homology classes are born in s·h,(s+1)·h and persist or die at the end of (s+1)·h,(s+2)·h and not any other j-homology class is born, persists or dies in these intervals. Then, we put 0 in entries 2s and 2s + 3, and m in entries 2s + 1 and 2s + 2. On the other hand, let us suppose that m j-homology classes are born and die in s · h,(s + 1) · h and in (s + 1) · h,(s + 2) · h and not any other j-homology class is born, persists or dies in these intervals. Then, we put 0 in entries 2s and 2s + 2 and m in entries 2s + 1 and 2s + 3. Observe that only considering (a) and (b) separately, we can distinguish both scenarios. This way, fixed a plane π, we obtain two 2n-dimensional vectors for Kπ, one for the 0-persistence barcode and the other for the 1-persistence barcode associated to the filtration Kπ. Since we have eight planes, {π1,··· ,π8}, and two vectors per plane,
we have a total of sixteen 2n-dimensional vectors which form the topological signature for a gait sequence.
Selecting an appropriate distance metric is critical to many learning algorithms, such as k-means, nearest neighbor searches, and others. However, the choice of such a measure is very specific problem and, ultimately dictates the success or failure of the learning algorithm. The distance metric learning approach has been proposed for both unsupervised and supervised problems. In this section, we first introduce the general idea of Mahalanobis metric learning and then give an overview of the approach used in this study.
Mahalanobis distance learning is a prominent and widely approach for improving classification results by exploiting the structure of the data. Given n data points xi ∈ Rm, the goal is to estimate a matrix M such that:
dM(xi,xj) = (xi − xj)T M(xi − xj)
describes a pseudo-metric. In fact, this is assured if M is positive semi-definite. If M = ∑−1(i.e.,the inverse of the sample covariance matrix), then dM is referred to as the Mahalanobis distance. Thus, given a pair pf samples (xi,xj) we break down the original multi-class problem into a two-class problem in two steps. First, we transform the samples from the data space to the label difference space X = {xij = xi − x − j} which is inherently given by the metric definitions. Moreover, X is invariant to the actual locality of the samples in the feature space. Second, the original class labels are discarded and the samples are arranged using pairwise equality and inequality constrains, where obtain the classes same S and different D:
S = {(xi,xj)|y(xi) = y(xj)} D = {(xi,xj)|y(xi) ≠ y(xj)}
In our particular case the pair (xi,xj) consists of the topological descriptor associated to each sample.
Linear Discriminant Analysis: Different ways have been proposed to estimate Mahalanobis distance metrics to compute distance in k-NN classification. This approach has been used to discover informative linear transformations of the input space, which can be seen as inducing a Mahalanobis distance metric in the original space.
Let xi ∈ Rm be a sample and c its corresponding class label. Then, the goal of Linear Discriminant Analysis(LDA) Hastie and Tibshirani (1996) is to compute a classification function g(x) = LT x such that the Fisher-criterion

where Sw and Sb are the within-class scatter and between-class scatter matrices, is optimized.
In fact, LDA is to project the high-dimensional samples into a low-dimensional subspace using linear mapping, which has the maximum inter-class distance and minimum intra-class distance between the projected samples in the low dimensional subspace through searching for an optimized projection matrix. It operates in a supervised setting and uses the class labels of the inputs to derive informative linear projections. In the context of metric learning, LDA computes a linear projection L that maximizes the amount of between-class variance relative to the amount of within-class variance. The linear transformation L is chosen to maximize the ratio of between-class to within-class variance, subject to the constraint that L defines a projection matrix. The traditional LDA algorithm is still attractive compared to several recently developed metric learning Liao et al. (2014).
In this section we show the results in two experiments using CASIA-B dataset. The CASIA-B dataset has 124 persons, and 10 samples for each of 11 different angles at which a person is taken. For each angle there are six samples walking under natural conditions (CASIA-Bnm), there are two samples of persons carrying some sort of bag (CASIA-Bbg) and the remaining two samples for persons wearing coat (CASIA-Bcl). CASIA-B provides image sequences with background segmentation for each person. In the first experiment we used four sequences by person from the CASIA-Bnm to train and we used the other two sequences by person from CASIA-Bnm, CASIA-Bbg and CASIA-Bcl to test. Our results for side view (90 degrees) are reported in Table 1, where the experiment was repeated 5 times using 200 PCA components, which provide the best results. The result of our previous method using cosine distance and angle distance was also evaluated using always the lowest fourth part of the body silhouette.

In the second experiment, we considered a mixture of normal, carrying-bag and wearing-coat sequences, since it models a more realistic situation where persons do not collaborate while the samples are being taken. We take six sequences to train: four sequences from CASIA-Bnm, one sequence from CASIA-Bbg and one sequence from CASIA-Bcl, the rest was used to test. Using this training data we generated 123 topological signatures, one for each person in the database, this gave us 246 sequences for testing: 123 persons times 2 sequences by person. The experiment was repeated 5 times using 200 PCA components too. Table 2 shows the result of the accuracy. As it can be seen in tables, in general to introduce a metric learning to replace the cosine and angle distance achieve better results.

In this paper we have presented an algorithm for gait recognition, a technique with special attention in tasks of video surveillance. We have used persistent homology to model the gait, similar, as we did in our previous approaches. The algorithm presented here is slightly different to previous works in the final step (classification), where, we introduced the use of metric learning to learn a Mahalanobis distance metric to robust gait recognition. This learned metric enforces objects for the same class to be closer while objects from different classes are pulled apart. We conducted experiments using CASIA-B database, and showed the effectiveness of the methods proposed compared with other state-of-the-art methods. Besides, the topological features have been tested here using only the lowest fourth part of the body silhouette. Then, the effects of variations unrelated to the gait in the upper body part, which are very frequent in real scenarios, decrease considerably. This confirms that the highest information in the gait is in the motion of the legs and to learn the similarity from data improves the results.
*Autor para la correspondencia: kaprekar.aguirre@gmail.com




