Estatística

Ristić-Balakrishnan extended exponential distribution

Distribuição Ristić-Balakrishnan exponencial estendida

Frank Gomes-Silva
Universidade Federal Rural de Pernambuco, Brasil
Thiago Alexandro Nascimento de Andrade
Universidade Federal de Pernambuco, Brasil
Marcelo Bourguignon
Universidade Federal do Rio Grande do Norte, Brasil

Ristić-Balakrishnan extended exponential distribution

Acta Scientiarum. Technology, vol. 40, 2018

Universidade Estadual de Maringá

Received: 25 January 2017

Accepted: 19 September 2017

Abstract: In this paper, we introduce and study a new generalization of the extended exponential distribution, called the Ristić-Balakrishnan extended exponential distribution. The new model adds one parameter in the baseline model and its failure rate function can accommodate both inverted bathtub and bathtub shapes. Important distributions are obtained as a special case of our model, such as exponential and Lindley distributions. The main purpose is to define a new flexible distribution with great power adjustment to survival data sets. For this reason, we provide a comprehensive mathematical treatment of the new model. Furthermore, we use a real data set that proves empirically the power of adjustment of the new distribution compared to other competitive models in the literature.

Keywords: generalized distribution, statistical properties, quantile function, maximum likelihood estimation, model fit.

Resumo: Neste artigo introduzimos e estudamos uma nova generalização da distribuição exponencial estendida, denominada Ristić-Balakrishnan exponencial estendida. O novo modelo adiciona um parâmetro a distribuição-base e sua função taxa de falha pode acomodar as formas de banheira e banheira invertida. Importantes distribuições são obtidas como caso especial de nosso modelo, tais como a Lindley e a exponencial. O principal objetivo é definir uma nova distribuição flexível com alto poder de ajuste a dados de sobrevivência. Por este motivo, estabelecemos um tratamento matemático abrangente ao novo modelo. Além disso, utilizamos um conjunto de dados reais que comprovam empiricamente o poder de ajuste da nova distribuição em comparação a outros modelos competitivos na literatura.

Palavras-chave: distribuição generalizada, propriedades estatísticas, função quantílica, estimação por máxima verossimilhança, ajuste de modelos.

Introduction

It is hardly necessary to emphasize that a probabilistic model is commonly employed to attack practical situations in which a deterministic model is not feasible. This definition, albeit implicitly, had already been part of common sense since the Renaissance era, in which the notion of probability was unconsciously employed to propose solutions in games of chance (Bernstein, 1996). In fact, this intrinsic sense of probability lies at the heart of scientific methodology. Here it is worth quoting the classic book ‘The logic of scientific discovery’ by Sir Karl Popper:

The most important application of the theory of probability is to what we may call chance-like or random events, or occurrences. These seem to be characterized by a peculiar kind of incalculability which makes one disposed to believe after many unsuccessful attempts that all known rational methods of prediction must fail in their case. We have, as it were, the feeling that not a scientist but only a prophet could predict them. And yet, it is just this incalculability that makes us conclude that the calculus of probability can be applied to these events (Popper, 1959, p. 167).

Taking a leap forward in time, we see that probabilistic models still arouse the fascination of applied scholars and researchers. This interest materializes in the great amount of works that are dedicated to the proposal of new distributions. In particular, those dealing with distribution generators. Our research presented below is related to the generalization of probabilistic models through generators of distributions. In the generator approach, we refer to the following papers: Marshall and Olkin (1997) for the ‘Marshall-Olkin’ class; Eugene, Lee, and Famoye (2002) for the ‘beta’ class; Zografos and Balakrishnan (2009) for the ‘Gamma’ class; Cordeiro and Castro (2011) for the ‘Kumaraswamy’ class and Cordeiro, Ortega, and Cunha (2013) defined the ‘exponentiated generalized’ class of distributions.

Recently, Gómez, Bolfarine, and Gómez (2014) introduced a new extended exponential (EE for short) distribution. For x > 0, its cumulative density function (cdf) and probability density function (pdf) are given by Equation 1 and 2:

(1)

(2)

where:

α > 0 and β ≥ 0. Several mathematical properties of the EE distribution, including expectation, variance, moment generating function (mgf), asymmetry and kurtosis coefficients, among others, were studied by Gómez et al. (2014). In particular, they proved that the density of the EE model is a mixture of the exponential and gamma densities.

We believe that the addition of parameters to the EE model may generate new distributions with great adjustment capability and, for this reason, we propose a generalization of it. On the other hand, Ristić and Balakrishnan (2012) defined the ‘Ristić-Balakrishnan’ –G (RB –G for short) family for x ∈ ℝ and a > 0 having, respectively, pdf and cdf given by Equation 3 and 4:

(3)

(4)

where: g ( x , ξ) = dG( x , ξ), with ξ a parametric vector, is the gamma function and denotes the lower incomplete gamma function. The main motivation for this family is that, for a = n ∈ ℕ, Equation 3 is the pdf of the nth lower record value of a sequence independent and identically distributed variables from a population with density g(x, ξ).

In this paper we propose a new lifetime model called ‘Ristić-Balakrishnan extended exponential’ (RBEE) distribution by taking Equation 1 in 4. As we will see later, the proposed model is quite flexible and its failure rate function can accommodate both inverted bathtub and bathtub shapes, which are important for reliability, life time, biological and medical sciences, among others. In addition, the new density may be expressed as a mixture of ‘Erlang’ densities. Thus, many properties can be derived using this simple representation. As will also be clear later, many important distributions are obtained as a special case of our model. Finally, we prove the new model is very superior in terms of adjustment to real data, when compared to the base model and other important models well established in the literature.

Material and methods

The RBEE distribution

Let X be a random variable with support on the positive real line having the RBEE distribution, say X ~ RBEE( a , α, β ). The cdf of X is defined by inserting Equation 1 in Equation 4, according Equation 5:

(5)

where:

The density of X, for x > 0, can be reduced to Equation 6:

(6)

We write F( x )=F( x ; a , α, β) in order to eliminate the dependence on the model parameters. Clearly, the EE model is a special case of Equation 5 when a = 1. The exponential and Lindley distributions arise as special cases when β = 0 and β = 0, respectively, in addiction to a = 1. If β = 0, β = 1 and a ≠ 1, we obtain the RB-exponential and RB-Lindley respectively.

Some plots of the pdf Equation 6 are displayed in Figure 1. These plots reveal that the RBEE pdf is quite flexible and can take various forms reinforcing the importance of the proposed model.

The survival function is Equation 7:

(7)

The hazard rate function (hrf) and reversed hazard rate function (rhrf) of X are given by Equation 8 and 9:

(8)

(9)

respectively. Some plots of the hrf Equation 8 are displayed in Figure 2. Non-monotone forms such as bathtub and inverted bathtub are particularly important because of its great practical applicability. For example, the time of human life is one phenomena that the bathtub form is applicable.

Plots of the R BEE density
function for some parameter values.
Figure 1.
Plots of the R BEE density function for some parameter values.

Asymptotic and shapes of the RBEE

For a detailed mathematical approach for the RBEE model, we investigate the shapes of its pdf and hrf using their first and second derivatives. The first derivatives of log { f ( x )} and log { h ( x )} for the RBEE model are given by Equation 10 and 11:

(10)

(11)

Plots of the R BEEhazard function for some parameter values.
Figure 2.
Plots of the R BEEhazard function for some parameter values.

Hence, the critical values of f ( x ) and h ( x ) are the roots of the Equation 12 and 13:

(12)

(13)

respectively. The values x0 and x’0 which solves the Equations 12 and 13 above can be a maximum, minimum or inflection point. To check this, we evaluate the signs of the second derivatives of log { f ( x )} and log { h ( x )}, respectively, at x = x0 and x = x'0. We have Equation 14 and 15

(14)

(15)

It is common to obtain numerical solutions with high accuracy through optimization routines in most mathematical and statistical platforms.

Quantile function

For many practical purposes, it is important to make explicit the quantile function (qf) of X. The RBEE qf, say q ( u ) can be obtained by inverting Equation 5 (for 0< u <1) as Equation 16:

(16)

where:

z = Q-1 is the inverse function of and W(·) denotes the Lambert W-function. In a recent paper, Nadarajah, Bakouch, and Tahmasbi (2011) used the Lambert W-function to derive the qf of the exponentiated Lindley distribution. For any complex t , the Lambert W-function is defined as the inverse of the function g( t ) = tet. For more details, see http://mathworld.wolfram.com/LambertW-Function.html. An implementation in R software is available through the ‘LambertW’ package. See http://cran.r-project.org/web/packages/LambertW/LambertW.pdf. In the ‘Mathematica platform’, the ‘LambertW’ is available through the function ‘ProducLog[z]’, which gives the principal solution for w in z = wew. By using the Lagrange inversion theorem, we can write an expansion for the qf of X as follows Equation 17:

(17)

Note that the above equation can be easily implemented in computational platforms that have numerical elementary routines.

The applications of qf are diverse and include: calculation of the moments, estimation of parameters, simulations, calculation of asymmetry and kurtosis measurements, among others. For illustration, we use the qf of X to determine the Bowley skewness (Kenney & Keeping, 1962) (B) and Moors kurtosis (Moors, 1988) (M). The Bowley skewness is based on quartiles B = [Q(3/4)-2Q(1/2)+Q(1/4)]/[Q(3/4)-Q(1/4)], whereas the Moors kurtosis is based on octiles M = [Q(7/8)-Q(5/8)-Q(3/8)+Q(1/8)]/[Q(6/8)-Q(2/8]. These two measures are less sensitive to outliers and they exist even for distributions without moments.

In Figure 3 and 4, we present 3D plots of B and M measures for several parameters values. These plots are obtained using the ‘Wolfram Mathematica’ software. Based on these plots, it is possible to conclude that changes in the additional parameter a have a considerable impact on the skewness and kurtosis of the RBEE distribution, thus showing its greater flexibility.

Plots of the Bowley
skewness for the RBEE distribution for some parameter values.
Figure 3.
Plots of the Bowley skewness for the RBEE distribution for some parameter values.

Plots of the Moors kurtosis for the R BEE
distribution for some parameter values.
Figure 4.
Plots of the Moors kurtosis for the R BEE distribution for some parameter values.

Properties

A useful representation

We provide useful linear representations for Equation 5 and 6 based on the exponentiated class of distributions. Mathematical properties of the exponentiated distributions have been published by many authors in the 90s and more recently. See, for example, Gupta and Kundu (1999) for exponentiated exponential, Nadarajah et al. (2011) for exponentiated Lindley, Sarhan and Kundu (2009) for exponentiated linear failure rate and, more recently, Lemonte (2013) for the exponentiated Nadarajah-Haghighi distributions. For an arbitrary baseline cdf G( x ) a random variable Ya has the exp-G class with power parameter a >0 say Ya ~ exp-G(a) if its cdf and pdf are given by Ha( x ) = G( x )a and h a(x) = ag( x )G( x )a-1 respectively. For a comprehensive discussion about the exponentiated class, see a recent paper by Tahir and Nadarajah (2015).

By using results presented in Cordeiro and Bourguignon (2016) we can be expressed the pdf f ( x ) as Equation 18:

(18)

With

(19)

where:

quantities di( a -1) (for i 0) determined by d0(c)= c/2, d1(c)=c(3c+5)/24, d2(c)=c(c2+5c+6)/48, d3(c)=c(15c3+150c2+485c+502)/5760, etc.

Note that, by integrating Equation 18, we can express F ( x ) as Equation 20:

(20)

where:

Hj+1(x) denotes the exp-EE cumulative distribution with power parameter j +1. Here, hj+1( x ) is the exp-EE density function with power parameter j +1, and is given by (for j 0 ) . By interchanging by in the last equation, where Equation 21:

(21)

and, after a simple algebraic manipulation, we obtain Equation 22 and 23:

(22)

(23)

where:

π( x ; i +1, ( k +1)α) denotes the pdf of the Erlang distribution with parameters i +1 (for i≥0) and ( k +1)α. If Z is an Erlang random variable with parameters s (=1, 2, 3,…) and λ>0, its pdf is given by π(z; s, λ)= λszs-1 e-λz/(s-1)!. Changing by , the density function reduces to Equation 24:

(24)

where . Equation 24 is the main result of this section.

Moments, incomplete moments and generating function

Then, the n th moment of X and its incomplete moments, respectively, are given by Equation 25 and 26:

(25)

(26)

where denotes the upper incomplete gamma function.

The moment generating function (mgf) of X can be determined from Equation 24, according Equation 27:

(27)

Then, for all t <( k +1)α, we have Equation 28:

(28)

Order statistics

By using results presented in Cordeiro and Bourguignon (2016) the density function f i:n( x ) of the i th order statistic, say Xi:n, for i =1,…, n , from a random sample X1,...,Xn having the RBEE distribution, can be expressed as Equation 29:

(29)

with

(30)

where w r is defined by Equation 19, and .

Reliability

In reliability the stress-strength model describes the life of a component which has a random strength X1 that is subjected to a random stress X2. The component fails at the instant that the stress applied to it exceeds the strength, and the component will function satisfactorily whenever X1 > X2. Hence, R=P (X2 < X1) is a measure of component reliability. When X1 and X2 have independent RBEE( a 1, α , β ) and RBEE( a 2, α , β ) models the reliability is defined by . The pdf of X1 and cdf of X2 are expressed from Equations 18 and 19 as Equation 31:

(31)

where (for s = j , k ; p = i , n and q = 1, 2)

(32)

Thus, we have Equation 33:

(33)

where Ij,k = w j,i( a 1) w k,n( a 2).

Hence, after a simple algebraic manipulation, the reliability of the RBEE distribution is given by Equation 34:

(34)

Entropy

The Rényi entropy is defined (for δ > 0 and δ ≠ 0), according Equation 35:

(35)

Let be the baseline survival function. Following similar idea given in Nadarajah, Cordeiro, and Ortega (2015) (Section 10), we have Equation 36:

(36)

where:

The constants p j,k can be calculated recursively by Equation 37:

(37)

for k =1, 2,…, p j,0 =1 and c k =(-1)k+1( k +1)-1. By using Equation 36 and generalized binomial expansion we obtain the Rényi entropy for the family as Equation 38 and 39:

(38)

Where:

(39)

comes from the baseline distribution. Based on the cdf Equation 1 and pdf Equation 2, we can express Equation 39 as Equation 40:

(40)

where is ‘exponential integral function’.

Estimation and inference

The maximum likelihood method is the one that stands out most among the estimation methods admitting good asymptotic properties. The maximum likelihood estimators (MLEs) can be used when constructing confidence intervals and regions and also in test statistics. Let x 1,…, x n be a random sample of size n from the RBEE ( a , α, β) model. The log-likelihood function for the vector of parameters Θ = ( a , α, β)T can be expressed from Equation 41:

(41)

where :

The components of the score vector U(Θ) are given by Equation 42, 43 and 44:

(42)

(43)

(44)

where

ψ(·) is the digamma function and .

The information matrix is given by j (Θ)={-Urs} and its elements Urs(Θ)=∂2ℓ(Θ)/∂r∂s for r , s ∈{ a , α, β} can be obtained from the authors upon request.

Results and discussion

Application

We consider an uncensored data set from Murty, Xie and Jiang (2004) (page 180) used in the industry, representing the failure time (in weeks) of 50 components put into use at time. The data are: 0.013, 0.065, 0.111, 0.111, 0.163, 0.309, 0.426, 0.535, 0.684, 0.747, 0.997, 1.284, 1.304, 1.647, 1.829, 2.336, 2.838, 3.269, 3.977, 3.981, 4.520, 4.789, 4.849, 5.202, 5.291, 5.349, 5.911, 6.018, 6.427, 6.456, 6.572, 7.023, 7.087, 7.291, 7.787, 8.596, 9.388, 10.261, 10.713, 11.658, 13.006, 13.388, 13.842, 17.152, 17.283, 19.418, 23.471, 24.777, 32.795, 48.105. Table 1 provides some descriptive statistics.

Table 1.
Descriptive statistics for number of successive failure times of 50 components.
Descriptive statistics for number of successive failure
times of 50 components.

For Murthy et al. (2004)'s data, we compared the RBEE model with the EE Gómez et al. (2014) and Lindley sub-models and other commonly used models in survival analysis, namely the log-logistic, Fréchet and Birnbaum-Saunders (BS) distributions. The densities of these models are given, for example, in the Wolfram alpha website (https://www.wolframalpha.com).

Table 2 gives the MLEs of the fitted models to the current data with their corresponding standard errors, in addition to the AIC, BIC and CAIC statistics. Table 3 lists the values of the A* and W* statistics. In general, it is considered that lower values of these criteria fit better the data.

Table 2.
MLEs (and the corresponding standard errors in parentheses), AIC, BIC and CAIC statistics for number of successive failures for the air conditioning system.
MLEs (and the corresponding
standard errors in parentheses), AIC, BIC and CAIC statistics for number of successive
failures for the air conditioning system.

Table 3.
Mean, Variance, Skewness and Kurtosis for the three main distributions.
Mean, Variance, Skewness
and Kurtosis for the three main distributions.

Additionally, we took into consideration the Anderson-Darling (A*) and Cramér-von Mises (W*) statistics (Chen & Balakrishnan, 1995). Chen and Balakrishnan (1995) proposed a general approximate goodness-of-fit test for the hypothesis H0: X1,…,Xn with Xi following F( x ; θ), where the form of F is known but the p-vector θ is unknown. To obtain the statistics A* and W*, we can proceed as follows: (1) compute r i = F( x i; θ), where x i’s are in ascending order, and then yi = φ -1( r i), where φ ( ) is the standard normal cumulative distribution; (2) compute , where and ; (3) calculate and and then and . Table 3 lists the values of the A* and W* statistics. In general, it is considered that lower values of these criteria fit the data better.

Table 3 presents the mean, variance, asymmetry and kurtosis for the RBEE, EE and Lindley adjusted models. As we can see, the empirical and estimated means and variances do not differ considerably. This shows that the models are adequate to explain this data.

The figures in Table 2 and 4 reveals that the R BEE model has the lowest AIC, BIC, CAIC, A* and W* values among all fitted models. Thus, the proposed RBEE distribution is the best model to explain these data. Finally, Figure 5 displays the histogram of the data and the estimated pdf and cdf of the R BEE model. These plots reveal that the proposed model is quite suitable for these data.

Table 4.
Goodness-of-fit tests.
Goodness-of-fit
tests.

Conclusion

In this article, we introduce and study a new model of lifetime, called the ‘Ristić-Balakrishnan extended exponential’ distribution. The proposed

model has three parameters and generalizes important distributions. We provide a comprehensive study of the mathematical and statistical properties of the new model. In addition, the practical utility of the new model was empirically demonstrated. We hope that the RBEE model can be useful for applied statisticians and other researches who refer to a model with few parameters but flexible to accommodate supported data in real positives.

Acknowledgements

We thank two anonymous referees and the associated editor for their valuable suggestions, which certainly contributed to the improvement of this paper. Additionally, Thiago A. N. de Andrade is grateful the financial support from CAPES (Brazil).

 Estimated pdf of the RBEE
model; Estimated cdf of the RBEE model.
Figure 5.
Estimated pdf of the RBEE model; Estimated cdf of the RBEE model.

References

Bernstein, P. L. (1996). Against the gods: the remarkable story of risk. New York, NY: John Wiley & Sons.

Chen, G., & Balakrishnan, N. (1995). A general purpose approximate goodness-of-fit test. Journal of Quality Technology, 27(2), 154-161.

Cordeiro, G. M., & Bourguignon, M. (2016). New results on the Ristić–Balakrishnan family of distributions. Communications in Statistics - Theory and Methods, 45(23), 6969-6988. doi 10.1080/ 03610926.2014.972573

Cordeiro, G. M., & Castro, M. (2011). A new family of generalized distribution. Journal of Statistical Computation and Simulation, 81(7), 883-893.

Cordeiro, G. M., Ortega, E. M. M., & Cunha, D. C. (2013). The exponentiated generalized class of distributions. Journal of Data Science, 11(1), 1-27.

Eugene, N., Lee, C., & Famoye, F. (2002). Beta-normal distribution and its applications. Communications in Statistics - Theory and Methods, 31(4), 497-512. doi 10.1081/STA-120003130

Gómez, Y. M., Bolfarine, H., & Gómez, H. W. (2014). A new extension of the exponential distribution. Revista Colombiana de Estadística, 37(1), 25-34.

Gupta, R. D., & Kundu, D. (1999). Generalized exponential distributions. Australian and New Zealand Journal of Statistics, 41(2), 173-188.

Kenney, J. F., & Keeping, E. S. (1962). Mathematics of statistics (3rd ed.). Trenton, NJ: Chapman & Hall.

Lemonte, A. J. (2013). A new exponential-type distribution with constant, decreasing, increasing, upside-down bathtub and bathtub-shaped failure rate function. Computational Statistics and Data Analysis, 62, 149-170. doi 10.1016/j.csda.2013.01.011

Marshall, A. N., & Olkin, I. (1997). A new method for adding a parameter to a family of distributions with applications to the exponential and Weibull families. Biometrica, 84(3), 641-652.

Moors, J. J. (1988). A quantile alternative for kurtosis. Journal of the Royal Statistical Society. Series D, 37(1), 25-32.

Murthy, D., Xie, M., & Jiang, R. (2004). Weibull models. Wiley series in probability and statistics. Trenton, NJ: John Wiley and Sons.

Nadarajah, S., Bakouch, H. S., & Tahmasbi, R. (2011). A generalized Lindley distribution. Sankhya B, 73(2), 331-359.

Nadarajah, S., Cordeiro, G. M., & Ortega, E. M. M. (2015). The Zografos-Balakrishnan-G family of distributions: Mathematical properties and applications. Communications in Statistics - Theory and Methods, 44(1), 186-215. doi 10.1080/03610926.2012.740127

Popper, K. (1959). The logic of scientific discovery. Viena: Taylor & Francis e-Library.

Ristić, M. M., & Balakrishnan, N. (2012). The gamma exponentiated exponential distribution. Journal of Statistical Computation and Simulation, 82(8), 1191-1206. doi 10.1080/00949655.2011.574633

Sarhan, A. M., & Kundu, D. (2009). Generalized linear failure rate distribution. Communications in Statistics - Theory and Methods, 38(5), 642-660. doi 10.1080/ 03610920802272414

Tahir, M. H., & Nadarajah, S. (2015). Parameter induction in continuous univariate distributions: Well-established G families. Anais da Academia Brasileira de Ciências, 87(2), 539-568. doi 10.1590/0001-3765201520140299

Zografos, K., & Balakrishnan, N. (2009). On the families of beta-and gamma-generated generalized distribution and associated inference. Statistical Methodological, 6(4), 344-362. doi 10.1016/j.stamet.2008.12.003

Notes

License information This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Author notes

franksinatrags@gmail.com

HTML generated from XML JATS4R by