Application of spatial auto-beta models in statistical classification

Eglė Zikarienė; Kęstutis Dučinskasa

Articles

Erdvinių auto-beta modelių taikymas statistiniame klasifikavime

Eglė Zikarienė egle.zikariene@ku.lt

Vilnius University, Lituania

Kęstutis Dučinskasa kestutis.ducinskas@ku.lt

Klaipeda University, Lituania

Application of spatial auto-beta models in statistical classification

Lietuvos matematikos rinkinys, vol. 62 Ser. A, pp. 38-43, 2021

Vilniaus Universitetas

Esta obra está bajo una Licencia Creative Commons Atribución 4.0 Internacional.

Recepción: 10 Julio 2021

Publicación: 15 Diciembre 2021

DOI: https://doi.org/10.15388/LMR.2021.25219

Abstract: In this paper, spatial data specified by auto-beta models is analysed by con- sidering a supervised classification problem of classifying feature observation into one of two populations. Two classification rules based on conditional Bayes discriminant function (BDF) and linear discriminant function (LDF) are proposed. These classification rules are critically compared by the values of the actual error rates through the simulation study.

Keywords: Bayes discriminant function, linear discriminant function, actual error rate, supervised□1 classification.

Summary: Straipsnyje pristatomos naujos statistinio klasifikavimo taisyklės erdviams auto-beta modeliams. Jos paremtos sąlygine Bajeso ir tiesine diskriminantinėmis funkcijomis. Sprendžiamas uždavinys, kai erdvės taškas klasifikuojamas į 1 iš 2 populiacijų, su žinoma požymo reikšme ir mokymo aibe. Populiacijos apibrėžiamos regresoriais, bendraisiais ir klasių parametrais. Visi skaičiavimai atlikti simuliuotiems duomenims, su keletu skirtingų modelio parametrų rinkinių. Siūlomos klasifikavimo taisyklės palyginamos skaičiuojant tikrąją klasifikavimo klaidą, su skirtingais apriorinių tikimybių vertinimais.

Keywords: Bajeso diskriminantinė funkcija, tiesinė diskriminantinė funkcija, tikroji klasifika- vimo klaida, prižiūrimas klasifikavimas.

Introduction

An approach for spatial classification using Bayes rules was introduced by DuÄin- skas [5]. This approach is based on conditional distributions of observations to be classified given training sample for continuous spatial index. Case with discrete spa- tial index for Gaussian Markov random fields is explored in [4, 6, 7]. General statistical analysis of spatial non-gaussian data associated with exponential family and based on generalized linear models has been analysed in [2, 13]. Spatial discrimination based on BDF for feature observations having elliptically contoured distributions is implemented in [1, 8].

In this paper we focus on auto-beta models introduced by Besag [2] for the case when the sufficient statistic as well as the canonical parameter are one-dimensional.

Moller [12] presented the simulation algorithms for several spatial one-parameter auto- models. Specific attention will be paid to the multi-parameter auto-models that are properly studied in [10, 9, 3]. We consider aparticular case of spatial auto-beta models for solving classification problem of feature observation by using plug-in discriminant functions.

This paper is organized as follows: the problem description and the introduction of spatial auto-beta model are presented in the first section and discriminant functions and error rates are analyzed in the next section; in Section 3 numerical experiments are described and the conclusions are in the last section.

1 Discriminant functions based on spatial auto-beta model and corresponding error rates

In this paper we consider random fields ${Z (s) : s ∊ D \subset R_{2}}$ and ${Y (s) : s ∊ D \subset R_{2}}$ as the feature and class label, respectively. Assume that feature values belong to (0, 1) and class label takes value of 1 or 2. Suppose that $S_{n} = {s_{i ∊ D,} i = 1, ... n}$ training locations (STL) where feature observations with known class label are taken and feature values and class labels are denoted by $Z (s_{i}) = Z_{i}$ and $Y (s_{i}) = Y_{i}$ , respectively, here $i = 0, 1, ..., n$ . A training sample is denoted by $T = Z^{'}, Y^{'}$ where $Y = (Y_{1} ..., Y_{n})', Z = (Z_{1}, ..., Z_{n})'$ .

We focus on the spatial auto-beta models (SABE) and supervised classification problem with fixed STL, when feature observation $Z_{0}, T = (Z', Y')$ are given. Then conditional distribution for unlabeled observation .. under SABE model is

Z_{0} | T = t, Y_{0} = 1 \sim Beta (μ_{0 l}, \emptyset_{0 l})

Denote the full conditional density function for the feature by

f (z | α, b) = z^{μ_{0}^{l} \emptyset_{0}^{l} - 1} (1 - z)^{1 - μ_{0}^{l}) \emptyset_{0}^{l} - 1} / B e (μ_{0}^{l} \emptyset_{0}^{l}; (1 - μ_{0}^{l}) \emptyset_{0}^{l}),

(1)

where $μ_{0}^{l} = E (Z_{0} | T = t, Y_{0} = l; ψ) = \frac{1 + A_{01}^{l}}{2 + A_{01}^{l} + A_{02}^{l}}, \emptyset_{0}^{1} = 2 + A_{01}^{l} + A_{02}^{l}$ , with parameters $A_{01}^{l} = α_{01}^{l} - \sum_{j \neq 0} (η ο j I n (1 - z_{j})), A_{02}^{l} = α_{02}^{l} - \sum_{j \neq 0} (n o j I n (z_{j})), α_{0}^{l} = {x'}_{0} β^{l}$ where $x_{i} - m$ vector of explanatory variables and $β_{m}^{l}$ unknown regression coefficients $ψ$ denotes the set of all model parameters. $Be (μ_{0}^{l} \emptyset_{0}^{l}; (1 - μ_{0}^{l}) \emptyset_{0}^{l})$ is Euler Beta function. Spatial auto-beta models have been recently studied by several authors[11].

Then conditional BDF for SABE obtain the form

W (z_{0}; ψ) = I n (π_{0}^{2} f_{02}) = (α_{01}^{1} - α_{01}^{2}) I n (Z_{0}) + (α_{02}^{1} - α_{02}^{2}) I n (1 - Z_{0}) + γ_{0 (ψ)}

(2)

where $γ_{0} (ψ) = I n (u)$ , with $u = \frac{π_{0}^{1} B e (A_{01}^{2} + 1, A_{02}^{2} + 1)}{π_{0}^{2} B e (A_{01}^{1} + 1, A_{02}^{1} + 1)}, π_{0}^{l}$ a prior probability in a population $Ω_{1, l = 1, 2}$

The prior probabilities depend on the location of focal observation and the number of neighbours: $π_{0}^{l} = \sum_{j \in N N_{i}^{l}} \frac{1}{d i j}$ where $d i j$ is the distance between sites $s_{i}$ and $s_{j}$ $i, j = 1, . . ., n, N N_{i} = N N_{i}^{2}$ $N N_{i}^{l}$ are sites belonging to the nearest neighbourhood set of $s_{i}$ in population $Ω_{l}$ , $l = 1, 2$

So BDF allocates the observation in the following way: classify observation $Z_{0}$ given $Z = z$ to the population $Ω_{1}$ if $W (Z_{0}; ψ) \geq 0$ and to the population $Ω_{2}$ , otherwise.

We compare BDF with LDF for SABE in order to classify testing samples. In this work a modified LDF function is used where class conditional means and dispersions are used for the estimation. The modified LDF function:

L (Z_{0}; Ψ) = ((Z_{0} - \frac{μ^{1} + μ^{2}}{2}) / (π_{0}^{1} σ_{1}^{2} + π_{0}^{2} σ_{2}^{2})) (μ^{1} - μ^{2}) + γ_{0}^{*},

(3)

where $γ_{0}^{*} (Ψ) = I n (π_{0}^{1} / π_{0}^{2}), π_{0}^{l},$ prior probability, $μ^{l}, σ_{l}^{2}$ conditional means and variance de distribución $B e t a (μ_{0 l}; \emptyset_{0 l}), l = 1, 2$ .

From the statistical decisions theory it is known that Bayes discriminant functions ensures the minimum of misclassification probability.

Definition 1. The Bayes error rate for the $W (Z_{0}; Ψ)$ specified in (2) is defined as

E R (Ψ) = {\sum π_{0}^{l} P l,}_{l = 1}^{2}

where for $l = 1, 2, P_{l} = P_{l z} ((- 1)' W (Z_{0}; Ψ) > 0) = \int H ((- 1)' W (z_{0}; Ψ)) f_{01} (z_{0}) d z_{0}$ . with $H (\cdot)$ denoted the Heaviside step function: $H (ν) : = 1_{ν > 0}$ and probability measure P_lzis based on conditional Beta distribution with pdf $f_{0 l}$ specified in (1).

The error rate for $L (Z_{0}; Ψ)$ specified in (3) which is denoted by $L E R (Ψ)$ is defined in (4), when $W (Z_{0}; Ψ)$ is replaced by $L (Z_{0}; Ψ)$ .

In practice parameter estimators are obtained by maximizing the pseudo-likelihood function, i.e.:

L_{PML} (Ψ) = \prod_{i = 1}^{n 1} f_{i 1} \prod_{i = n_{1} + 1}^{n} f_{i 2}, Ψ = a r g m a x {\log (L_{PML} (Ψ))}

(5)

By replacing the parameters with their estimators in $W (Z_{0}; Ψ)$ and $L (Z_{0}; Ψ)$ we construct their plugin versions denoted by $W (Z_{0}; Ψ)$ and $L (Z_{0}; Ψ)$ .

Definition 2. The actual error rate for the $W (Z_{0}; Ψ)$ is

A R (Ψ) = \sum_{l = 1}^{2} π_{0}^{l} Pl,

(6)

where $\hat{P l} = P_{l z} ((- 1) \overset{l}{W} (Z_{0}; \hat{Ψ}) > 0),$ for $l = 1, 2$

The actual error rate for the $L (Z_{0}; \hat{Ψ})$ which is denoted by $LAR (\hat{Ψ})$ is defined in (6), when $W (Z_{0}; \hat{Ψ})$ is replaced by $L (Z_{0}; \hat{Ψ})$ .

2 Numerical experiments

To evaluate proposed classification procedure a few different scenarios were chosen that differ in model shape defined by different parameter values. Based on the chosen parameter scenarios and using the first order neighbour scheme $S = [1, N] \times [1, N]$ : each site $i ∊ S$ has four neighbours denoted as $[i_{e} = i + (1, 0), i_{w} = i - (1, 0), i_{n} = i + (0, 1), i_{s} = i - (0, 1)}$ with obvious neighbour adjustments near the boundary. Conditional natural parameter expressions are chosen for population:

A_{i 1}^{l} = β_{1}^{l} x 1 + β_{2}^{l} x 2 - (η_{1} (I n (1 - z_{j e}) + I n (1 - z_{j w})) + η_{2} (I n (1 - z_{j n}) + I n (1 - z_{j s}))), A_{i 2}^{l} = β_{3}^{l} x_{1} + β_{4}^{l} x_{2} - (η_{1} (I n (z_{j e}) + I n (z_{j w})) + η_{2} (I n (z_{j n}) + I n (z_{j s})))

In this case parameter vector $Ψ' = (β_{1}^{l}, β_{2}^{l}, β_{3}^{l}, β_{4}^{l}, η_{1}, η_{2})$ First, based on the chosen parameter scenarios, 100 replications of data were generated. Each simulation was divided into two sets: 80% training sample and 20% testing sample. In the learning stage, training sample is used to build the model and in the testing stage sample is used to compare classification rules. In the learning stage all feature values of the attributes and spatial dependency are used to build the model and in the testing step one value is hidden. In this stage model parameter vector is considered unknown and model parameters are evaluated using maximum pseudo-likelihood method described in (5). Therefore, simulations are conducted on the lattice size $n = 16 \times 16$ . Two types of parametrical structures were chosen: when all parameters are fixed except the class 2 mean tendency parameter and when spatial dependency parameter that describes effects of the north-south neighbourhood points is changing. Chosen parameter values are presented in Table 1.

The calculations were performed using 3 kinds of prior probabilities: 1) when $π_{0}^{1} = π_{0}^{2} = 0.5$ and actual error rate is noted as $A R^{1}$ ; 2) using inverse distance function with all training sample points $A R^{2}$ ; 3) using inverse distance function for neighbour points of up to fourth order $A R^{3}$ . The Actual error rate ratio for different priors is presented in Fig. 1.

In both cases when beta parameter changes and when eta parameters change the ratio $A R^{1} / A R^{3}$ and $A R^{2} / A R^{3}$ is greater than 1, i.e. the smallest $A R$ estimates are obtained when priors are calculated using the third method. Actual error rate values were compared for different classification rules when prior probabilities are calculated by the third method described above.

Actual error rate ratio curves are presented in A part of Fig. 2. When beta values are chosen less than 1 the ratio is greater than 1 and LDF based classification rule is performing better. When $β > 1$ the ratio decreases and BDF based classification rule gains advantage. In B part when eta is chosen less than 10 the ratio is less than 1 and BDF based classification rule is performing better. When eta value is chosen 25 or greater LDF based classification rule gains advantage.

Table 1.

Parametric structures.

Fig. 1
AR(Ψ ) ratio curves with different prior probability.

Fig. 2.
AR(Ψ ) and LAR(Ψ ) ratio curve with respect to β and η.

3 Discussion and conclusions

In this paper we proposed two classification rules for non-gaussian spatial data based on auto-beta models in the frameworks of Bayesian and linear discriminant func- tions. Simulation data study was conducted to estimate and empirically compare the BDF classifier with LDF classifier for various parametric structures and prior class probability models. Numerical analysis showed that:

1. While considering the situations with different prior probabilities better results are achieved by including fourth order neighbours in calculating prior proba- bilities in cases when prior probabilities are equal and when prior probabilities include all training points.
2. In situation when all parameters are fixed except for the second class mean tendency parameter the results show that class separability increases when mean tendency parameters increase. When $β < 1$ LDF has advantage and when $β > 1$ DBF performs better.
3. In situation when all parameters are fixed except for the spatial dependency pa- rameter that is common for both classes the results yield that class separability decreases when spatial dependency parameter increases, and Actual error rate approximates 1.

The results of performed calculations in all examples give us the strong argument to encourage the users to consider non-gaussian spatial data models directly ignoring various data normalization procedures.

References

A. Batsidis, K. Zografos. Errors of misclassification in discrimination of dimensional coherent elliptic random field observations. Stat. Neerl., 65(4):446–461, 2011.

J. Besag. Spatial interactions and the statistical analysis of lattice systems. J. R. Stat. Soc. B, 148(4):1–36, 1974.

E.D. Cepeda, A.V. Nunez. Spatial double generalized beta regression models extensions and application to study quality of education in Colombia. J. Educat. Behav. Stat., 38:604–628, 2013.

L. Dreiziene, K. Ducinskas, L. Saltyte-Vaisiauske. Statistical classification of multi- variate conditionally autoregressive Gaussian random field observations. Spat. Stat., 28:216–225, 2018.

K. Ducinskas. Approximation of the expected error rate in classification of the Gaussian random field observations. Stat. Probab. Lett., 79:138–144, 2009.

K. Ducinskas, L. Dreiziene. Risks of classification of the Gaussian Markov random field observations. J. Classif., 35:422–436, 2018.

K. Dučinskas, L. Dreižienė, E. Zikarienė. Multiclass classification of the scalar Gaussian random field observation with known spatial correlation function. Stat. Prob. Lett., 98:107–114, 2015.https://doi.org/10.1016/j.spl.2014.12.008.

K. Dučinskas, E. Zikarienė. Actual error rates in classification of the t-distributed random field observation based on plug-in linear discriminant function. Informatica, 26(4):557–568, 2015.

C. Hardouin, Y. Jian-Feng. Multi-parameter auto-models and their application. Biometrika, 95(2):335–349, 2008.

M.S. Kaiser, N. Cressie, J. Lee. Spatial mixture models based on exponential family conditional distributions. Stat. Sinica, 12:449–474, 2002.

B.M. Lagos-A´lvarez, R. Fustos-Toribio, J. Figueroa-Zu´ niga. Geostatistical mixed beta regression: a Bayesian approach. Stoch. Environ. Res. Risk Assess., 31:571–584, 2017.

J. Moller. Perfect simulation of conditionally specified models. J. R. Stat. Soc. Ser. B, 61(1):251–264, 1999.

H. Zhang. On estimation and prediction for spatial generalized linear mixed models. Biometrics, 58(1):129–136, 2002.