Abstract: Benford or “first digit” law has been used successfully to evaluate epidemiological surveillance systems, especially during epidemics. Conventional statistical methods for evaluation (x2 and log-likelihood ratio) are controversial when the number of data is small (n <7). In this methodological note a new test is proposed to evaluate compliance with Benford’s law with small samples, which can be used with biomedical, medical and public health data.
Keywords: Analysis of dataAnalysis of data,EpidemicsEpidemics,COVID-19 infectionCOVID-19 infection,Public health emerging infectionsPublic health emerging infections.
Resumen: La ley de Benford o de los “primeros dígitos” ha sido usada exitosamente para evaluar los sistemas de vigilancia epidemiológica, en especial durante epidemias. Los métodos estadísticos convencionales para la evaluación (x2 y razón de log-verosimilitud) son controversiales cuando los datos son poco (n<7). En esta nota metodológica se propone una nueva prueba para evaluar el cumplimiento de la ley de Benford con muestras pequeñas, que puede ser usada con datos de biomedicina, medicina y salud pública.
Palabras clave: Análisis de datos, Epidemia, Infección con COVID-19, Infecciones emergentes en salud pública.
Nota metodológica
Benford´s Law with small sample sizes: A new exact test useful in health sciences during epidemics
Ley de Benford con muestras pequeñas: una prueba exacta nueva útil ciencias de la salud durante epidemias
Received: 28 February 2020
Accepted: 04 March 2020
Published: 19 March 2020
In some articles published the Benford´s or first-digit Law was proposed as a cost effective tool to evaluate data in biomedicine, medicine and public health 1), (2), (3), (4), (5), (6 (see Figure 1). Its use could be very important in sanitary emergencies as COVID-19 epidemic, when rapid evaluation of epidemiological surveillance systems require to be evaluated 7. Since the use can be controversial when only few data are available (n<7), we developed a new exact test to screen the fulfilment of Benford distribution. Under this law, the expected number digits for sample sizes varying from n=1 to 6 are in Table 1. In this case we assume that data come from an independent sequence of events (i.e., the occurrence of any particular digit doesn’t depend on the occurrence of any previous one). For this particular case, it can be used according with the known probabilities given by Benford, and using it as the scenario corresponding to the H0: “the data are Benford’s law distributed”. It can be expressed with the equation:
Where
Naturally, data coming from no-Benford distribution are less probable to appear distributed like that, and consequently are more likely to reject the H0. Thus, the analysis does not depend on the sample size.
To observe the performance of this test we used data with small sample sizes (n<7) from a previous publication 6. This test was implemented in the R package, using the code: dmultinom(x = c(#1,#2,#3,#4,#5,#6,#7,#8,#9), size = NULL, prob = c(0.301, 0.176, 0.125, 0.097, 0.079, 0.067, 0.058, 0.051, 0.046)), where #i represents the absolute frequency of the correspondent digit i=1,.. 9. With this procedure we obtained exact probabilities. After, p values were calculated with the EMT package developed by Menzel8 using the code: observed <- c(4,1,0,0,0,0,0,0,0) # observed data: underH0 <- c(0.301, 0.176, 0.125, 0.097, 0.079, 0.067, 0.058, 0.051, 0.046) # underH0 out <- multinomial.test(observed, underH0) # p.value, where #i represents the absolute frequency of the correspondent digit i=1,.. 9. Comparisons with Kuiper´s tests were realized 9.
With the new statistical test is possible to extend the use of Benford´s law to biomedical, medical and public health areas with small sample sizes.
Correspondence: José Moreno Montoya. Address: calle 119 No. 7-74 - Piso 2, Phone number: (57 1) 6030303 ext 1130. Bogotá D.C. Colombia. Email: josemorenomontoya@gmail.com