Estatística

A new class of gamma distribution

Cícero Carlos Brito
Universidade Federal Rural de Pernambuco, Brasil
Frank Gomes-Silva
Universidade Federal Rural de Pernambuco, Brasil
Leandro Chaves Rêgo
Universidade Federal do Ceará, Brasil
Wilson Rosa de Oliveira
Universidade Federal Rural de Pernambuco, Brasil

A new class of gamma distribution

Acta Scientiarum. Technology, vol. 39, no. 1, pp. 79-87, 2017

Universidade Estadual de Maringá

Received: 20 November 2015

Accepted: 04 May 2016

Abstract: This paper presents a new class of probability distributions generated from the gamma distribution. For the new class proposed, we present several statistical properties, such as the risk function, expansions to density and cumulative function, moment generating function, characteristic function, the moments of order , central moments of order , the log likelihood and its partial derivatives and also Rényi entropy, kurtosis, skewness and variance. Some of these properties are indicated for a particular distribution within this new class that is used to illustrate the capability of the proposed new class through an application to a real data set. The data set presented in Choulakian and Stephens (2001) was used. Six models are compared and for the selection of these models was used the Akaike Information Criterion (AIC) and tests of Cramer-Von Mises and Anderson-Darling to assess the models fit. Lastly, the conclusions from the analysis and comparison of the results obtained are presented, as well as the directions for future researches.

Keywords: generalized distribution, statistical properties, quantile function, maximum likelihood estimation, model fit.

Resumo: Este artigo apresenta uma nova classe de distribuição de probabilidades gerada a partir da distribuição gama. Para a classe proposta apresentamos algumas propriedades estatísticas tais como função de risco, expansões para densidade e acumulada, função geratriz de momentos, função característica, momentos de ordem , momentos centrais de ordem , função de log-verossimilhança e suas respectivas derivadas parciais, entropia de Rényi e medidas de curtose, assimetria e variância. Algumas dessas propriedades são indicadas para uma distribuição-base particular dentro dessa nova classe a fim de ilustrar a potencialidade da classe proposta por meio de uma aplicação a um conjunto de dados reais. O conjunto de dados apresentado em Choulakian and Stephens (2001) foi usado. Seis modelos são comparados, e para a seleção destes foi utilizado o critério de informação de Akaike, e testes de Cramer-Von Mises e Anderson-Darling foram usados para avaliar o ajuste aos modelos. Finalmente, apresentamos as conclusões com dados da análise e comparação dos resultados obtidos e sugerimos trabalhos futuros.

Palavras-chave: distribuição generalizada, propriedades estatísticas, função quantílica, estimação por máxima verossimilhança, ajuste de modelos.

Introduction

The gamma distribution is used in a variety of applications including queue, financial and weather models. It can naturally be considered as the distribution of the waiting time between events distributed according to a Poisson process. It is a two-parameter distribution, whose density is given by:

where

a > 0 is a shape parameter and b > 0 is the reciprocal of a scale parameter.

Due to the importance of this distribution, recently some new distributions as well as families of probability distributions based on generalizations of the gamma distribution have been proposed. Given a distribution with continuous distribution function G(x) its generalization or exponentiated formG(x) is obtained by F(x)=Ga(x), with a > 0 (power parameter). Gupta, Gupta, and Gupta (1998) proposed and studied some properties exponentiated gamma distribution.

Cordeiro, Ortega, and Silva (2011) extended the exponentiated gamma distribution defining a new distribution called Exponentiated Generalized gamma Distribution with four parameters, which is capable of modeling bathtub shaped failure rate phenomena.

Zografos and Balakrishnan (2009) defined a family of probability distributions based on the integration of a gamma distribution as follows:

where:

G(x) is an arbitrary distribution function. When a = n+1 this distribution coincides with the distribution of the nth highest value record (Alzaatreh, Famoye, & Lee, 2014).

Alternatively, Ristic and Balakrishnan (2012) have proposed a new family of probability distributions, which is also based on the integration of the gamma distribution. They defined this new family as follows:

where:

G(x) is an arbitrary distribution function. Similarly, when a = n+1 this distribution coincides with the distribution of the nth smallest value record (Alzaatreh et al. 2014).

Following the line of work of Zografos and Balakrishnan (2009) and Ristic and Balakrishnan (2012), our goal in this work is to propose a new family of distributions based on gamma distribution. The family of distributions proposed here is the following:

where:

G(x) is an arbitrary distribution function and HG(x) has the same support as the distribution G(x). This new family shall be called gamma-[(1-G)/G] class. The statistical properties of this new class, such as mean, variance, standard deviation, mean deviation, kurtosis, skewness, moment generating function, characteristic function and graphical analysis, are derived.

Then, to illustrate the applicability of the proposed new family, it is considered the particular case of the distribution obtained when taking into account that G(x) is the distribution function of an exponential random variable. By presenting mathematical structures for gamma-[(1-G)/G] class, it was also derived statistical properties from this new distribution, and, to illustrate its potentiality, an application to a set of real data is performed. For this, the data set presented in the work of Choulakian and Stephens (2001) was used to verify if the models are well adjusted to this data. As comparative criteria of fitness of the models, it was considered the Akaike (AIC), and the Cramer-von Mises and Anderson-Darling tests. Both hypothesis tests, Anderson-Darling and Cramér-von Mises, are discussed in detail by Chen and Balakrishnan (1995) and belong to the class of quadratic statistics based on the empirical distribution function, since they work with the squared differences between the empirical distribution and the hypothetical.

Material and methods

Obtaining a class of probability distributions

The gamma-[(1-G)/G] class is defined by the cumulative distribution function (cdf) (1) (for x > 0) which is equivalent to

where:

Q(a,z)=T(a,z)/T(a) is the regularized incomplete gamma function and is the incomplete gamma function, and T(a) is Euler gamma function. If the distribution G(x) has density g(x) the class will have a probability density function (pdf) given by

The Equations (2) and (3) can be rewritten as a sum of exponentiated distributions. These distributions have been studied by some authors in recent years, as for example, Mudholkar and Srivastava (1993) for exponentiated Weibull, Gupta and Kundu (1999) for exponentiated exponential, among others.

Using the power series exponential, we rewrite (3) as

it follows that

Since we can rewrite the distribution function as

Therefore

Next, we presented an expansion to gamma-[(1-G)/G] class when is discrete. If the distribution G(x) is discrete, HG(x) is also discrete and we have that P(X=xl)=F(xl)-F(xl-1). Therefore,

In addition, we can obtain the risk function of the new gamma-[(1-G)/G] class as follows:

By inverting HG(x)=u (with, 0>u<1) it is obtained an explicit expression for quantile function as where Q-1(a,u) is the inverse function of regularized incomplete gamma function.

Using the density and distribution function expansions, it is possible to get the statistical properties of the new class, as discussed below. Equations (4) and (5) are the main results of this subsection.

Moments and moment generating function

Several of the interesting characteristics and features of a probability model can be obtained using moments such as tendency, dispersion, skewness and kurtosis. The following equations are the development of the expansion calculations for the moments of order m for the gamma-[(1-G)/G] class. The nth moment of a random variable having cdf (2) can be easily obtained from Equation (4). Hence, we have

Therefore,

where:

The expression (6) is important since it generalizes the well-established probability weighted moments.

In particular, we have the following expansion of the mean for the gamma-[(1-G)/G] class

The following is the development of the expansion calculations for the moment generating function for the gamma-[(1-G)/G] class. We have from Equation (4),

Using the fact that we can rewrite

Therefore, using (6), the last equation can be expressed as

Similarly, one can establish the following expansion for the characteristic function for the gamma-[(1-G)/G] class

Central moments and general coefficient

We will look at the development of the expansion calculations for central moments of order m to the gamma-[(1-G)/G] class. This measure can be calculated as

or equivalently

Since

it follows that

In particular, by expanding the range of variance for the gamma-[(1-G)/G] class we have:

A new generalization called general coefficient, which extends the skewness and kurtosis, is given by

Substituting (7) and (8) in Equation (9), we obtain

Note that, in particular, as m=3 and m=4 in Cg(m) we obtain expansions to skewness and kurtosis measures, respectively.

Maximum likelihood estimation and Rényi entropy

After knowing a few regularity conditions, the maximum likelihood estimates (MLEs) can be obtained by equating the derivative of the log-likelihood function with respect to each parameter to zero. We determine the MLEs of the parameters of the gamma-[(1-Exp)/Exp] class from complete samples only. Let x1,...xn be a random sample of size n from the new class, where is a vector of unknown parameters in the parent distribution . Earlier in section we wrote and to emphasize the parametric vector. The log-likelihood function for the vector of parameters can be obtained as

The log-likelihood can be maximized, for example, either directly by using the SAS (ProcNLMixed) or by using the nonlinear likelihood expressions obtained by differentiating . The components of the score vector are given by

and

where:

where:

is the digamma function.

Entropy is a measure of uncertainty in the sense that the higher the entropy value, the lowest the information and the greater the uncertainty, or the greater the randomness or disorder. The following is the expansion entropy calculations for the gamma-[(1-G)/G] class, using the Rényi entropy, which is given by

Substituting the expressions of density and cumulative distribution function given by Equations (3) and(2), respectively, we have

By expanding the exponential function in Taylor series as

we have

Now, using the following binomial expansion

it follows that

Thus, an explicit expression for Rényi entropy can be written

which, in turn, implies that (using Equation (6))

Results and discussion

Special model

This section, will examine a particular distribution of the gamma-[(1-G)/G] class proposed here. It will be considered the particular case in which that is called the gamma-[(1-Exp)/Exp] distribution.

The gamma-[(1-Exp)/Exp] distribution

Considering G(x) the cdf of the exponential distribution with parameter in Equation (2), we have the gamma-[(1-Exp)/Exp] distribution:

Differentiating H(x), we get the density function of the gamma-[(1-Exp)/Exp] distribution:

Figure 1 show the graph of the gamma-[(1-Exp)/Exp] distribution probability density functions and cumulative distribution, for some values of the parameters.

In right pdf and
left cdf of the gamma-[(1-Exp)/Exp] distribution for
some values of lambda .
Figure 1.
In right pdf and left cdf of the gamma-[(1-Exp)/Exp] distribution for some values of lambda .

We can also obtain the risk function using the gamma-[(1-Exp)/Exp] distribution as follows:

Figure shows the graph of the risk function using the gamma-[(1-Exp)/Exp] distribution generated from some values assigned to parameters.

Plots of the risk
function for some parameter values.
Figure 2.
Plots of the risk function for some parameter values.

Using procedure similar to what was done in pdf and cdf expansions, the pdf and cdf of the gamma-[(1-Exp)/Exp] distribution we can rewritten as a sum of exponentiated exponentials, as follows:

and

Various properties of the exponentiated exponential can be obtained from Gupta and Kundu (1999). Using expansions (11) and (12), it is possible to obtain mathematical quantities of the special model such as ordinary and central moments, moment generating and characteristic functions, general coefficient, Rényi entropy and some others from quantities exponentiated of exponential distribution. For example, we consider only moments for reasons of space. The mth ordinary moment of the special model can be expressed as

In particular, we have that the mean of the gamma-[(1-Exp)/Exp] distribution is given by

Let x1,...xn be a sample of the size n from x~gamma-[(1-Exp)/Exp](a,b,l) The log-likelihood function for the vector of parameters can be obtained as

Importar tabla

The components of the score vector are given by

and

where:

Application

In this section, an application to real data for the proposed gamma distribution will be displayed. The data used in this research are from the excesses of flood peaks (in m3 s-1) Wheaton river near Carcross in the Yukon Territory, Canada. Seventy-two exceedances of the years 1958 to 1984 were recorded, rounded to one decimal place. These data were analyzed by Choulakian and Stephens (2001), and are presented in Table 1.

Table 1.
Full excess peaks in m3 s-1 Rio Wheaton.
 Full excess peaks in m3 s-1
Rio Wheaton.

It is worth mentioning that this data set has also been analyzed by means of the distributions of Pareto, Weibull three parameters, the generalized Pareto and beta - Pareto (Akinsete, Famoye & Lee, 2008).

In Table 2, we can see the maximum likelihood estimates obtained by the Newton-Raphson implemented in SAS 9.1 statistical software, parameters, standard errors, Akaike information criterion and Anderson-Darling statistics (A*) and Cramér von Mises (W*) to the gamma-[(1-Exp)/Exp] distribution (M1), gamma-[(1-Exp)/Exp] distribution (proposed model, M2), exponentiated Weibull (M3), modified Weibull (M4), beta Pareto (M5) and Weibull (M6). Its densities are given by

and

where:

B(a,b)denotes the beta function and the parameters above are all positive real numbers.

For the six distributions shown in Table 2, the data applied to Wheaton river flooding, it was observed that beta-Pareto model (M5), which was described by Akinsete et al. (2008) as the best fitted model, in our studies had a lower performance with AIC = 524.398, A* = 2.0412 and W* = 0.3516, when compared to the proposed gamma-[(1-Exp)/Exp] model (M2) that obtained AIC = 505.030, A* = 0.4516 and W* = 0.0757. Also according to Table 2, the proposed distribution model M2 is the best tested once the lowest values of AIC, A* and W* are from such distribution.

The plots of the fitted gamma-[(1-Exp)/Exp]pdf and two better fitted pdfs are displayed in Figure 3. The graph shows that the gamma-[(1-Exp)/Exp] model has similar behavior to that of other distributions, being very competitive in the analysis of such data.

Table 2.
Estimated maximum likelihood parameter, errors (standard errors in parentheses) and calculations of AIC statistics, AIC, BIC, HQIC, tests A* and W* for the M1 to M6 distributions.
Estimated maximum likelihood parameter, errors
(standard errors in parentheses) and calculations of AIC statistics, AIC, BIC,
HQIC, tests A* and W* for the M1 to M6 distributions.

Fitted distributions to the mass data of flood
peaks in Wheaton river.
Figure 3.
Fitted distributions to the mass data of flood peaks in Wheaton river.

Conclusion

As concluding remarks, we note that the class of gamma-[(1-G)/G] probability distributions developed in this work is a novel way of generalizing the gamma distribution and can be applied in different areas depending on the choice of the distribution G. In a future research, we intend to carry out more detailed comparisons between the novel distribution family proposed in this paper and the family of distributions investigated in Zografos and Balakrishnan (2009), which are also based on the integration of the gamma distribution.

In this paper, we study in detail only a distribution of the gamma-[(1-G)/G] class, namely the gamma-[(1-Exp)/Exp] distribution. Some properties of this distribution were derived and applied to a set of real data, obtaining better fit than that obtained in a previous study by Akinsete et al. (2008). We intend to conduct the study of new distributions within this class as future work.

We note that, after adding several parameters to a model, this model can become better adjusted to a particular phenomenon due to its greater flexibility. On the other hand, one should not forget that there may be a problem for the estimation of the parameters, since it can occur both computational and identifiability problems in parameter estimation. Thus, the ideal is to choose a model that reflects well the phenomenon / experiment with the minimum number of parameters. In the case of the proposed class in this research, only two additional parameters are added to the set of parameters of the G distribution.

References

References

Akinsete, A., Famoye, F., & Lee, C. (2008). The beta-Pareto distribution. Statistics, 42(6), 547-563.

Alzaatreh, A., Famoye, F., & Lee, C. (2014). The gamma-normal distribution: properties and applications. Computational Statistics and Data Analysis, 69(1), 67-80.

Chen, G., & Balakrishnan, N. (1995). The general purpose approximate goodness-of-fit test. Journal of Quality Technology, 27(2), 154-161.

Choulakian, V., & Stephens, M. A. (2001). Goodness-of-fit for the generalized Pareto distribution. Technometrics, 43(4), 478-484.

Cordeiro, G. M., Ortega, E. M. M., & Silva, G. O. (2011). The exponentiated generalized gamma distribution with application to lifetime date. Journal of Statistical Computation and Simulation, 81(7), 827-842.

Gupta R. C., Gupta, P. L., & Gupta, R. D. (1998). Modeling failure time data by Lehman Alternative. Communication in Statistics - Theory and Methods, 27(4), 877-904.

Gupta, R. D., & Kundu, D. (1999). Theory & Methods: Generalized exponential distributions. Australian and New Zealand Journal of Statistics, 41(2), 173-188.

Mudholkar, G. S., & Srivastava, D. K. (1993). Exponentiated weibull family for analyzing bathtub failure-rate data. IEEE Transactions on Riliability, 42(2), 299-302.

Ristic, M. M., & Balakrishnan, N. (2012). The gamma exponentiated exponential distribution. Journal of Statistical Computation and Simulation, 82(8), 1191-1206.

Zografos, K., & Balakrishnan, N. (2009). On the families of beta-and gamma-generated generalized distribution and associated inference. Statistical Methodological, 6(4), 344-362.

Author notes

franksinatrags@gmail.com

HTML generated from XML JATS4R by