Supervised linear classification of Gaussian spatio-temporal data

Marta Karaliutė; Kęstutis Dučinskas

Articles

Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

Recepción: 01 Septiembre 2021

Publicación: 15 Diciembre 2021

DOI: https://doi.org/10.15388/LMR.2021.25214

Abstract: In this article we focus on the problem of supervised classifying of the spatio- temporal Gaussian random field observation into one of two classes, specified by different mean parameters. The main distinctive feature of the proposed approach is allowing the class label to depend on spatial location as well as on time moment. It is assumed that the spatio-temporal covariance structure factors into a purely spatial component and a purely temporal component following AR(p) model. In numerical illustrations with simulated data, the influence of the values of spatial and temporal covariance parameters to the derived error rates for several prior probabilities models are studied.

Keywords: separable covariance function, AR(p) model, Bayes discriminant function.

Summary: Nagrinėjama tiesinė diskriminantinė gausinių erdvės-laiko duomenų analizė, naudojant separabilius kovariacijų modelius ir erdvėje bei laike kintančias klasės žymes. Laiko kovariacijai naudojamas AR(p) modelis, o erdvinei kovariacijai naudojamas eksponentinis modelis. Nagrinėjamas klasifika- vimo procedūros efektyvumas, kuris vertinamas panaudojant generuotus Gauso atsitiktinių laukų duomenis.

Keywords: separabili kovariacinė funkcija, AR(p) modelis, Bajeso diskriminantinė funkcija.

Introduction

Spatial supervised classification is a problem of labeling observations based on fea- ture information and information about spatial adjacency relationships with training sample. This problem has been studied by numerous authors (see e.g. Atkinson and Lewis [1]). However, these studies are usually based on the assumption of con- ditional independence of feature observations. Comprehensive overview of methods for statistical classification and discrimination of Gaussian spatial data is provided by Berrett and Calder [3]. The novel approach to classification of Gaussian Random Importar imagen Importar tabla Field (GRF) observation by avoiding the assumption of conditional independence is developed by Dučinskas and Dreižienė [5]. However, statistical discriminant analy- sis of spatio-temporal data has been rarely considered previously. Šaltytė-Benth and Dučinskas [8] considered classification of spatio-temporal data modeled by GRF in particular case when observation of feature at focal location is uncorrelated with the training sample.

In the present paper, avoiding independence restrictions, we focus on the classi- fication of data modeled by random fields with separable spatio-temporal covariance structures specified by geostatistical spatial margins and discrete temporal margins (see e.g. [4]). Separability of covariances was assumed for the sake of reduction of complexity due to interdependencies between features.

The main distinctive feature of the proposed approach is allowing the class label to depend on spatial location as well as on time moment. That essentially widens the application area of presented investigations.

For the performance of classifiers, the values of derived in Local Bayes error rates and empirical error rates are used. For numerical illustrations, the exponential isotropic models for spatial covariance and AR(p) model of temporal covariance are considered. This is the extension of AR(1) case explored in Karaliutė and Dučinskas [7]. Performance of proposed classification rule is compared for various parameters of pure spatial and temporal covariances and prior class probabilities models.

This paper is organized as follows: proposed spatio-temporal data models and conditional distributions are delivered in the next section; in Section 2 conditional Bayes classification rule and its error rates are presented; in Section 3 the numerical illustrations and simulations for various separable stationary spatio-temporal covari- ance and prior probabilities models are displayed, and finally, the conclusions are in the last section.

1 Spatio-temporal data models and conditional distributions

The main objective of this paper is to classify feature observations of $GRF {Z (s; t) : s \inD \subset D_{T}, t \in D_{T} = [0, \infty]},$ , where s and t define spatial and temporal coordinates, respectively. Let ${Y (s; t) : s \inD \subset R^{2}, t \in D_{T}}$ be a random field that represents class label and takes only the value 0 or 1.

In this study, we assume that for $ι = 0, 1$ the model of observation $Z (s; t)$ con- ditional on $Y (s; t) = ι$ is $Z (s; t) = μ ι (s; t) + ε (s; t)$ , where $μ ι (s; t) -$ deterministic spatio-temporal drift. The error term is assumed to be generated by the univariate zero-mean GRF ${ε (s; t) : s \inD \subset R^{2}, t \inT}$ , with the separable spatio-temporal covariance model $c o v (ε (s; t), ε (u; r)) =C (s, u; t, r)$ for all $s, u$ $\inD$ and $t, r \inT .$ .

Here $C_{s} (s, u)$ denotes pure spatial covariance between observations in locations $s$ and $u$ and $C_{T} (t, r)$ , denotes pure temporal covariance between observations at time points $t$ and $r$ . Under this assumption, the spatio-temporal covariance structure factors into a purely spatial and a purely temporal component, which allows for computationally efficient estimation and inference.

In this study we follow the popular tradition in environmental and agricultural research when the data are recorded at regular time intervals (time lags) and at irregular stations (locations) in compact area (see e.g. [2, 6]).

Let $S_{n} = {s_{i} \inD; i = 1, . . ., n}$ be a set of locations where observations are taken at time $t \in D_{m} = {1, 2, . . ., m, m + 1} .$ At every moment of time $t \in D_{m}$ the set $s_{n}$ is split into two classes, $S_{t}^{(0)}$ and $S_{t}^{(1)}$ $(i . e ., S_{n} = S_{t}^{(0)} \cup S_{t}^{(1)})$ $: S_{t}^{(l)} {= s \in S_{n} :Y (s, t) = l}, l = 0, 1 :$ . Denote nlt the number of locations (of n) at time $t$ that belong to class $l;$ thus nlt is the number of points in the set $S_{t}^{(l)}$ , and $n = n_{0 t} + n_{1 t}$ for every $t \in D_{m}$ . Hence a set of class labels at any time moment can differ in composition.

Joint training sample $Z$ is stratified training sample, specified by $n \times m$ ma- trix $Z = (Z_{1}, . . ., Z_{m})$ , where $Z_{t} = (Z (s_{n}, t), . . .,Z (s_{n}, t))' .$ This structure of data presentation is motivated by a model that assumes multivariate (in space) time se- ries. Denote by $z_{t} = (z_{t}^{1}, . . ., z_{t}^{n})$ and $y_{t} = (y_{t}^{1}, . . ., y_{t}^{n})$ the realized value of $Z_{t}$ and $Y_{t} = (Y (s_{1}, t), . . .,Y (s_{n}, t))',$ respectively

In this study we consider the linear parametric drift $μ_{1} (s; t) = β_{l}^{'} x (s),$ where $x_{i} = (x_{1} (s_{i}), . . ., x_{q} (s_{i}))'$ is the vector of a spatial covariates, and $β_{l}$ is a q dimensional vector of parameters, $i = 1, . . ., n, l = 0, 1$ , and $Δ β = β_{1} - β_{0}$ .

Denote by $X$ the $n \times 2 q m$ matrix $X = (X_{(1)}, X_{(2)}, . . ., X_{(m)})$ where

X_{(t)} = \{\begin{array}{lr} {x'}_{1} (1 - y_{t}^{1}) & {x'}_{1} & y_{t}^{'} \\ {x'}_{2} (1 - y_{t}^{2}) & x_{2}^{'} y_{t}^{2} \\ \begin{matrix} : \\ . \\ x_{n}^{'} (1 - y_{t}^{n}) \end{matrix} & \begin{matrix} : \\ \begin{matrix} . \\ x_{n}^{'} y_{t}^{n} \end{matrix} \end{matrix} \end{array})

Then the matrix model for $Z$ conditional on ${Y_{t} = y_{t}, t =, . . ., m}$ is $Z =XB \timesE,$ , where $B=I m ⨂ β$ with $β = (β_{0}^{'}, β_{1}^{'})'$ and $n \times m$ matrix of Gaussian errors $E = (ε (s_{i}; t) : i = 1, . . ., n; t = 1, . . . m) .$ Here $I_{m}$ is $m \times m$ identity matrix, and $β i s 2 q \times 1$ vector of parameters. Under covariance separability assumption $v e c (E) ~ N_{n m} (C_{T} ⨂ C_{s})$ with $C_{T} = (c_{T}^{t r} = C_{T} (t, r); t, r = 1, . . ., m)$ denoting the $m \times m$ matrix of pure temporal covariances and $C_{s} = (c_{s}^{i j} = C_{s} (s_{i}, s_{j}); i, j = 1, . . ., n)$ denoting the $n \times n$ matrix of pure spatial covariances.

In present paper we are dealing with the problem of classification of the observa- tions $Z (s_{i}, m + 1), i = 1, . . ., n$ into one of two classes with given joint training sample $Z$ or, in other words, based on training sample information we want to predict label at an observed location at the time moment $t = m + 1 .S e t c_{T}^{m + 1, r} = C_{T} (m + 1, r); r = 1, . . ., m,$ $c_{T}^{m + 1, 1}, . . ., c_{T}^{m + 1, p})'$ and $e_{i}^{'}$ the ith row of identity matrix $I_{n}$ .

Under spatio-temporal data model specification, we can conclude that in $l = 0, 1,$ , the conditional distribution of $Z (s_{i}, m + 1)$ given $Z = z$ and $Y (s_{i}; m + 1) = l,$ is Gaussian, $i . e .,$

(Z (s_{i}, m + 1) |Z = z;Y (s_{i}; m + 1 = l) ~N (μ_{l i (z)}^{m + 1}, \sum_{m + 1 i (z)}),

(1)

where $\sum_{m + 1, i (z)} = v a r (Z (s_{i}, m + 1)) - c_{s}^{i i} (c_{T}^{m + 1})' C_{T}^{- 1} c_{T}^{m}$ $= c_{s}^{i i}, ρ_{m + 1}^{}, μ_{l i}^{m + 1} = {β'}_{l} x_{i} + ((c_{T}^{m + 1})' C_{T}^{- 1} ⨂ e_{i}^{'}) v e c (E)$ with $ρ_{m + 1} = c_{T}^{m + 1, m + 1} - (c_{T}^{m + 1})' C_{T}^{- 1} c_{T}^{m + 1} .$ .

In this study, we assume that the conditional distribution of label $Y (s_{i}, m + 1), i = 1, . . ., n,$ , given joint training sample $Z$ depends only on class labels values, $i . e .$ conditional distribution of $(Y (s_{i}, m + 1 = l |Z = z)$ is identical to conditional distribution of $(Y (s_{i} m + 1 = l | {Y_{t} = y_{t}, t = 1, . . ., m})$ . Set P $P (Y (s_{i}, m + 1 = l |Z = z = π_{l} (s_{i}, m + 1) l = 0, 1,$ , and, for simplicity, call them prior class probabilities.

2 Conditional Bayes discriminant functions and its error rates

Under the assumption that the classes are completely specified, the conditional Bayes discriminant function (CBDF) minimizing the probability of misclassification is formed by the log-ratio of conditional likelihood of distribution specified in (1), that is

\begin{matrix} W_{Z} (Z (s_{i}, m + 1)) \\ = (Z (s_{i}, m + 1 - \frac{μ_{0 i (z)}^{m + 1} + μ_{0 i (z)}^{m + 1}}{2}) \sum_{m + 1, i (z)}^{- 1} (μ_{1 i (z)}^{m + 1}) + γ i(m + 1), \end{matrix}

(2)

where $γ i (m + 1) =$ ln $(π_{1} (s_{i}, m + 1) / π_{2} (s_{i}, m + 1)) .$ .

Let us call the probability of misclassification for $W_{Z} (Z (s_{i}, m + 1))$ as Local Bayes error rate and denote it by $P_{i}$ . Also let us denote squared Mahalanobis distance between conditional distributions by

Δ_{m + 1, i (z)}^{2} = (μ_{1 i (z)}^{m + 1} - μ_{0 i (z)}^{m + 1})' \sum_{m + 1, i (z)}^{- 1} (μ_{1 i (z)}^{m + 1} - μ_{0 i (z)}^{m + 1}) .

(3)

Specific attention is given to the Gaussian spatio-temporal model with pure spatial covariances belonging to the family of powered-exponential isotropic models and with pure temporal covariance of stationary AR(p) model.

Lemma 1. The Local Bayes error rate is

P_{i} = π_{0} (s_{i}, m + 1) φ(Q_{0 i}) + π_{1} (s_{i}, m + 1) φ (Q_{1 i})

where $Φ (x)$ is the standard normal cumulative distribution function and

Q_{l i} - \frac{Δ_{m + 1, i (z)}}{2} +(- 1)^{l} \frac{γ_{i} (m + 1)}{Δ_{m + 1, i (z)}}, Δ_{m + 1, i (z)}^{2} = ((Δ β)' x_{i})^{2} / c_{s}^{i i} σ_{T}^{2}, l = 0, 1 .

Proof. It is known that for AR(p) model parameters quantify the temporal depen-dency and for $t = 1, 2, . . ., m + 1, c_{T}^{t t} = C_{T} (0) = σ_{T}^{2} + \sum_{j = 1}^{p} a_{j} (j)$ (j), where $σ_{T}^{2}$ is the variance of the temporal white noise.

Then $μ_{l i (z)}^{m + 1} = β_{l x i}^{'} + ((0, . . ., 0, 0 α p, . . ., α_{1})' ⨂ e_{i}^{'})$ $v e c (E)$ and $\sum_{m + 1, i (z)} = c_{S}^{i i} σ_{T}^{2} .$

By using the properties of multivariate Gaussian distribution and inserting the above expressions in formula (3) we complete the proof of Lemma 1. H

Error estimation is critical to classification because the validity of the resulting classifier model, composed of the classifier and its error estimate, is based on the accuracy of the error estimation procedure. Given a set of sample data, the data can be split between training and test data, with a classifier being designed on the training data and its error being validated on the test data. In this paper our focus is on using p temporal observations for training and the observations at m + 1th time moment is using for testing.

3 Numerical illustrations and simulations

For numerical illustrations of the proposed classifier performance, we considered the Gaussian spatio-temporal model with pure spatial exponential covariances and with

Fig. 1.
Spatial sampling set S20 at moment T = 1.

pure temporal covariance stationary AR(1) model. Then it is easy to derive that $(c_{T}^{m + 1})' C_{T}^{- 1} = (0, . . ., 0, α_{1}) .$

In the study, the exponential isotropic nugetless spatial covariance is considered.

So $C_{s} = σ_{s}^{2} R,$ , where $R = (r_{i j})$ denotes the spatial correlation matrix with $r_{i j} = r (| s_{i} - s_{j} |) = e^{- | s_{j} - s_{j} | / φ} .$ Here $φ$ is the so called range parameter that represents the spatial dependence.

This choice is based on the smoothness level of sample paths. Sample paths of a GRF with the exponential covariance function are not smooth, when the squared exponential covariance model has smooth sample paths.

Two methods for prior class probabilities is proposed.

First one is based on Temporal Weighted Moving Average (TWMA) method

π_{1 t} (s_{i}, m + 1) = \frac{\sum_{t = 1}^{m} y_{i}^{t} t}{(1 + m) m / 2} .

Second one adds spatial correlations for weighting

π_{1 t s} (s_{i}, m + 1) = \frac{\sum_{t = 1}^{m} y_{i}^{t} t + \sum_{t = 1}^{m} y_{t = 1}^{t} t r_{i i o}}{(1 + m) m / 2 +_{} (1 + m) m / 2},

where $i_{o}$ denotes the index of the nearest neighbor to $s_{i}$ . Denote this method by (STWMA). We have compared these four particular cases by calculating the $P_{i}$ for $i = 1, . . ., n .$ . Numerical illustrations performed on 20 locations in two dimensional Euclidean area with $°$ and $∎$ for label indicators are depicted in Fig. 1.

The averages $AP = \sum_{i = 1}^{20} P_{i} / 20$ for two models of prior probabilities and for $φ =$ 2.5 and various $α_{1}$ at moment $t = 5$ are presented in Fig. 2. As it might be seen from Fig. 2, incorporation of spatial correlation in class prior probabilities ( $i . e .$ method STWMA) does not have advantage against method TWMA for almost all values of parameter $α_{1}$ .

4 Conclusions

In this paper we propose approach to classification of spatio-temporal data in the framework of Bayes discriminant analysis for separable spatio-temporal covariances.

Fig. 2.
Average Bayes error rates for ϕ = 2.5 and various α1 at moment t = 5.

Several simulation studies were conducted to estimate and compare empirically the classifiers for particular separable stationary spatio-temporal covariance and various prior class probabilities models. Numerical analysis showed that there is no reason to incorporate spatial correlation in the prior probabilities since it does not improve the performance of the proposed classifier.

References

[1] P.M. Atkinson, P. Lewis. Geostatistical classification for remote sensing: an introduction. Comput. Geosc., 26(4):361–371, 2000. https://doi.org/10.1016/S0098-3004(99)00117- X.

[2] G. Atluri, A. Karpatne, V. Kumar. Spatio-temporal data mining: a sur- vey of problems and methods. ACM Comput Surv: CSUR, 51(4):83, 2018. https://doi.org/10.1145/3161602.

[3] C. Berrett, C.A. Calder. Bayesian spatial binary classification. Spatial Stat., 16:72–102, 2016. https://doi.org/10.1016/j.spasta.2016.01.004.

[4] S.S. Demel, J. Du. Spatio-temporal models for some data sets in continuous space and discrete time. Stat. Sin., 25:81–98, 2015. https://doi.org/10.5705/ss.2013.223w.

[5] K. Dučinskas, L. Dreižienė. Risks of classification of the Gaussian Markov random field observations. J. Class., 35:422–436, 2018. https://doi.org/10.1007/s00357-018-9269-7.

[6] J. Haslett, A.E. Raftery. Space-time modelling with long memory depen- dence: Assessing Ireland’s wind power resourse. Appl. Stat., 38:1–50, 1989. https://doi.org/10.2307/2347679.

[7] M. Karaliutė, K. Dučinskas. Classification of gaussian spatio-temporal data with sta- tionary separable covariances. Nonlinear Anal. Model. Control, 26(2):363–374, 2021. https://doi.org/10.15388/namc.2021.26.22359.

[8] J. Šaltytė-Benth, K. Dučinskas. Linear discriminant analysis of multivariate spatial- temporal regressions. Scand. J. Stat., 32:281–294, 2005.