Artículos de Revisión

Spatially-based random sampling and its usefulness in epidemiological research

Muestreo aleatorio de base espacial y su utilidad en la investigación epidemiológica

Angela Marion Zambrana Vera *
IIBISMED, Bolivia
Sergio Alberto Avilés Ribera
SIG-CLAS, Bolivia
Fernando Gumucio Zabalaga
Environmental Engineer, Bolivia
Marcela Luizaga López
IIBISMED, Bolivia
Paul Pineda Gamarra
CLAS, Bolivia
Daniel Illanes Velarde
IIBISMED, Bolivia

Spatially-based random sampling and its usefulness in epidemiological research

Gaceta Médica Boliviana, vol. 43, no. 1, 2020

Universidad Mayor de San Simón

Todos los derechos morales a los autores y todos los derechos patrimoniales a la Gaceta Medica Boliviana

Received: 02 March 2020

Accepted: 26 May 2020

Abstract: The correct application of the sampling process techniques has become indispensable for research in the field of epidemiology, the challenge of every researcher, is that the results of a few can be extrapolated to a population. This article is a non-systematic review, which provides information on the application of a random sampling method associated with a geographical location, for the study of ?Reference Values ??of Grip Force in adults of the department of Cochabamba-Bolivia?. First, the introduction reviews the importance of estimating reference population parameters, based on descriptive questions. Next, the characteristics of the epidemiological research associated with spatiality are mentioned, in third place, the methodology and the experiences that the application of sampling in the project entails. Finally, an emphasis is placed on the need and relevance of the use of this methodology.

Keywords: geographical location, random sampling, epidemiology.

Resumen: La correcta aplicación de las técnicas del proceso de muestreo se ha hecho indispensable para la investigación en el campo de la epidemiologia, el desafío de todo investigador, es que los resultados de unos cuantos, puedan ser extrapolables para una población. El presente artículo, es una revisión no sistemática, que proporciona información sobre la aplicación de un método de muestreo aleatorio asociado con una ubicación geográfica, para el estudio de ?Valores de Referencia de la Fuerza de Agarre en adultos del departamento de Cochabamba-Bolivia?.Primeramente, en la introducción se revisa la importancia de estimación de parámetros poblaciones de referencia, a partir de preguntas descriptivas. Seguidamente se menciona las características de la investigación epidemiológica asociada a la espacialidad; en tercer lugar, se detalla la metodología y las experiencias que conllevo la aplicación del muestreo en el proyecto. Finalmente se hace un hincapié en la necesidad y relevancia del uso de esta metodología.

Palabras clave: ubicación geográfica, muestreo aleatorio, epidemiología.

Doing research in epidemiology is a methodological and practical challenge that must ensure the objective validity and applicability of the results studied. It is important not to have a biased response when investigating the variability and association of a fact or phenomenon. For this not to happen in the literature, it is recommended to study the totality of subjects that constitute a population of interest. However, it is rarely possible to observe and measure a total population, the reasons limiting this could be: financial resources, qualified human resources, time and geographical difficulties. According to Rothman, this population may be unlimited, in which case, complete coverage of the population is called a logical impossibility 1.

When it is an analytical or experimental study, it presents a finite number of a population of interest, whereas when it is a descriptive epidemiological study, the general population is infinite, i.e. generic, which is called the ?reference population? filled with characteristics defined by the researcher, e.g. diabetic populations, obese patients, obese hypertensive patients, etc.

As these are generic populations, with a potentially unlimited number of elements, it is mathematically impossible to determine their parameters: the denominator of any ratio will always be infinite and the fractions calculated will always be zero.

This mathematical concept corresponds to the epistemological concept of the impossibility of making general statements based on particular observations, known as the fallacy of induction or the fallacy of the consequent 2.

Despite the aforementioned mathematical impossibility of determining reference parameters of a population, when scientific knowledge is generated, mathematical relationships are established that explain the phenomena that occur in these generic populations.

To solve this problem, there are methods, techniques or sample models and sampling processes for subjects that define a population, with the premise that this sample or subset of people adequately represents the composition and variability of the constituent elements, and that the observations and conclusions can be used to generalise or extrapolate the results; this is inductive inference, an effort to infer the particular to the general 3.

Therefore, the estimation of reference population parameters should be primarily based on descriptive questions and the sampling process should be random as far as possible, since only probability sampling ensures that selection biases are removed and allows for an unbiased estimation of population parameters. Therefore, the aim of this non-systematic review article is to describe the use of a probability mechanism designed from the spatial component, which ensures that the probability of an element with a particular characteristic being included in the sample is proportional to the frequency of that characteristic in the reference population.

Epidemiological Research Associated With Spatiality

Since the basic coordinates of epidemiology - time, place and person 4- were established, spatiality has been inserted into the research context, taking into account that the fundamental premise is that populations and their living conditions, health and disease are not chaotically arranged over a territory, but follow well-defined geographical, socio-economic and cultural patterns 5.

Monitoring geographical variation in disease distribution and research to understand the underlying reasons for such variation is usually an important starting point. The study of the geographical distribution and spatial association of health events can be referred to as spatial epidemiology 6.

When dealing with health events, epidemiologists and other health professionals should pay special attention to the first law of geography, defined by Waldo Tobler in 1970, ?everything is related to everything else, but things close together are more related than things far apart ? 7, where the spatial clustering of individuals in a characteristic pattern is considered, and can be located and related by a pair of coordinates, an address or an area; this has been termed ?Spatial Analysis?.

In epidemiology, spatial analysis processes georeferenced events or entities to create new information that can be represented on maps 8. The data used for this can be point or clustered, depending on the unit of analysis studied. When the unit of analysis is the individual, the data are called point data, representing on the map the points with the exact geographical location where the event of interest occurred, be it disease, death, or any other event. If the unit of analysis is the geographical area, the data are called clustered or ecological and the variable of interest corresponds to characteristics of the geographical area, not of individuals 9. With this type of assessment it is possible to make predictions of spatio-temporal patterns of future disease behaviour and to monitor and identify areas where actions should be directed, allowing the spatial component to be captured and used in epidemic data 10.

The spatial component is linked to the position within a reference system established by objects and the spatial relationships between them 11. It makes the information geographic, because without it there is no location, and therefore no geographic framework. The spatial component answers the question: 1) the geographical location, 2) the spatial properties of objects or individuals, and 3) the spatial relationships that exist between them. On this basis, by knowing the geographical area being studied in a Geographic Information System (i.e. having the data on it), the association with epidemiological variables can be made, either in the methodological framework or in the results of the behaviour of the event studied 12.

Spatially Based Random Sampling

Etymologically, the word sample (representative part, specimen, specimen, prototype) comes from the Latin monstrare (to show) and from this the words sampling (technique of collecting parts that represent the quality of a whole) and sampling (application of the action of sampling) are derived.

Within the theory of sampling, there is an infinite number of studies and academic resources that provide detailed information on the use, classifications, developments, advantages, and disadvantages of sampling 13. In the process of deciding on a certain type of sampling, it must be assumed that the researcher is familiar with the variables he/she wishes to study, because in order to obtain generalisable results for the universe, the diversity of relationships must be sought, which is called ?structural heterogeneity?, that is, in operational terms, the structural levels that define the heterogeneity of the sample are the spatial axis, the socioeconomic axis, and the temporal axis. 14 In the city of Cochabamba, the sample?s heterogeneity is defined by the spatial axis, the socio-economic axis, and the temporal axis.

In the city of Cochabamba, Bolivia, there is a history of using sampling methodologies, such as LQAS (Lot Quality Assurance Sampling), which was used in a departmental study of knowledge and practices 15. This methodology was originally developed in the 1920 16 decade as a method of quality control in industrial production.

In the context of modern research, LQAS has become an accepted sampling method in public health, equivalent to stratified sampling. Sample sizes can be smaller and it is seen as a valuable tool for routine monitoring of geographic coverage 17. For example, in 1996 the World Health Organisation used this method ( Figure A1) to assess vaccination coverage. 18 LQAS consists of dividing the intervention domain into monitoring areas (lots), often in 19 samples for acceptable precision of at least 92% in ?dichotomous? conclusions 19.

Example of clustered ?LQAS? sampling proposed in Nigeria (Global Polio Eradication Initiative, 2012)
Figura A1:
Example of clustered ?LQAS? sampling proposed in Nigeria (Global Polio Eradication Initiative, 2012)

For the research study ?Values of vaccination? ( Figure A1), LQAS was used to assess vaccination coverage. For the research study ?Reference Values of Grip Strength in adults in Cochabamba?, which aimed to determine the variability of grip strength in both hands of men and women aged 20 to >60 years, through dynamometric measurements, and being a descriptive and cross-sectional epidemiological study, it was required to try to ensure a representative sample and a random sampling process for the reference population, in the city of Cochabamba-Bolivia. To develop the random sampling method associated with a geographic location, since 2017, we have worked in agreement with the Centre for Aerospace Surveys and Applications in Geographic Information Systems (CLAS) of the Universidad Mayor de San Simón, in Cochabamba.

The method required basic disaggregated and aggregated coordinate information from the urban and rural areas of the community, municipalities and provinces of the department of Cochabamba. During the first months, the necessary steps were taken to obtain information from the WFS services (web feature services) of the statistical geographic information system for the development of SIGED*, since the WFS are oriented to facilitate geographic information (points, lines and polygons) through the web, which allows the download of information in shape format.

A shapefile is a simple, non-topological format used to store the geometric location and attribute information of geographic entities 20.

*(http://geo.ine.gob.bo/cartografia/index.php/ visualizador_controller/visualizador_i3geo/)

For such access to the WFS of the SIGED a user and password is needed, which must be provided by the national institute of statistics INE, in our case, the corresponding steps were taken, but it was never provided.

It should be noted that the SIGED works with the programme i3Geo 6.0 (Integrated Interface for Internet Geoprocessing Tools) which is an application for the development of interactive maps on the web, which integrates several open source applications in a single development platform, mainly Mapserver and OpenLayers. The program is distributed under the GPL (GNU General Public License), which allows access to the source code and allows the software to be modified, distributed and shared, and was created by the Brazilian Ministry of the Environment (MMA) in 2004 21.

This i3GEO program can export maps of freely-managed areas to SVG (scalable vector graphics) format, which is a little-known vector format, but very useful for online use due to its flexibility and ability to offer high quality graphics.

The research team reviewed specialised literature on export methodologies, specifically from SVG to ShapeFile.

This process was carried out by means of 2 intermediate format conversions which were from SVG to PDF (Portable Document Format) and from PDF to DWG (Drawing) which is a binary file used to store design data in two and three dimensions, mainly used by the AutoCAD 22 programme.

Once the map in the mapped area was in DWG format, it was verified in the AutoCAD program that the polygons and lines had the corresponding street separations. Once this was done, it was exported to the Shape format, to obtain basic coordinate information from the mapped area, with the ARCGIS v10 program.

Coordinates were obtained for 47 municipalities, from which places with a population density of less than 15 inhabitants per km 2 were geographically excluded, resulting in the inclusion of 30 municipalities ( Figure A2). Having the spatial data information, i.e. the coordinates, was associated with the dependent variables of grip strength (age and gender), forming the database for the sampling process.

survey sampling points, by municipality, Cochabamba
Figure A2
survey sampling points, by municipality, Cochabamba

The sample size of the study was 384 subjects, with a margin of error of 5, confidence level of 95 and variability of 50%; stratified by 5 age groups divided into 10 years each (20-29,30-39,40-49,50-59,60+); random points were created throughout the department of Cochabamba, with the premise that each point results in a single area, without repeating points.

This methodology was called ?RANDOM SPACE-BASED SAMPLING?.

For the visualisation of the random points in real time, Google Earth was used. The fieldwork procedure started with finding the geographical location of the random point, using the MAPS64 Garmin GPS, in case there were more than 5 adult persons per household in the location, the KISH method was used, which is a sampling method used to select an individual at random within a household. It uses a pre-determined table to select an individual, based on the total number of individuals living in the household 26. On the same day, the informed consent form, the selection criteria form and the completion of the nutritional dynamometric survey corresponding to the study were completed.

As no individuals were found at the random reference point, point replacement methods were used, with randomisation within 8 metres of the GPS area.

Discussion

Working with spatial data has a number of implications that were carefully considered before any analysis was carried out. Spatial data is defined as any data that has a geographic reference associated with it, so that we can locate exactly ?where? it happens on a map 23. This definition includes field data (surfaces) or data associated with objects such as points, lines or polygons.

Following Tobler?s first geographical law, we used spatial autocorrelation, which is the finding that there is a relationship between an element and that which is in its vicinity, i.e. we correlated the spatial variable (coordinates) with the important dependent variables of the grip force, in order not to commit the Unit Area Modifiable Area Problem (PUAM) 24.

A particular problem related to PUAM is the so-called ?ecological fallacy ? 25, which consists in assuming that the values calculated for an area unit can be applied to the individuals of the population existing in that area, which can only be used in the case that there is complete homogeneity for the variable analysed, which is very rarely the case.

Much of the information generated in SIG can be originally collected at a given scale and is sometimes grouped into larger units for management purposes.

These effects are closely related. Generally speaking, the use of smaller units is equivalent to a smaller number of elements, which leads to lower statistical reliability. Thus, the use of larger units is equivalent to statistically more reliable, but loses all local variations. Spatial autocorrelation is useful for processes such as interpolation, as we may be able to estimate the value of locations where we have no data.

One of the limitations for this process, apart from those strictly associated with material resources, are often: 1) the lack of correspondence between the territorial units in which the different databases are generated, 2) the different administrative sectors, 3) the difference in formats, 4) the lack of digital format, and mainly 5) the dispersion of the necessary information and the definition of the cost for obtaining or elaborating the databases. Spatial information, maps and the search for Shape formats do not eliminate but rather reproduce the deficiencies of health information systems, as well as those of systems in other sectors 27.

When working with spatial units (countries, cities, municipalities, census regions, companies) there are effects of population distribution that are not conditioned by the spatial component, i.e., as we worked with a sample stratified by age groups, there will be variability and/or lack in these groups.

Furthermore, by approaching the subject from a spatial point of view, other variables arise that may be of interest in this type of epidemiological study; for example, the use of space or activity carried out in a geographical space may affect the results. Conducting the survey or sampling process in a residential area, for example, during working hours will result in the sample subjects not being present within the age ranges of interest because they are at work, giving the impression of a high population density. It is important to analyse these aspects in order to adjust the models to better reflect reality.

In conclusion, research is never perfect, one can design the best randomisation method for a descriptive study to achieve the coveted inference, but the behaviour of the population of interest differs when conducting fieldwork. Consideration should be given to population replacement methods, which do not affect randomisation.

As for spatial data, they present particularities of great importance in the analysis processes, the most relevant being: the existence of a structure, the presence of edge effects or scale effects, and derivatives such as the so-called Modifiable Area Unit Problem.

Spatial autocorrelation is another element that must always be taken into account when studying spatial data in epidemiology, as it conditions the results of the analyses depending on the autocorrelation.

Although this is the first time it has been used and validated in our environment, it is a new alternative for multidisciplinary work that opens the way to directing the operational methodology for population representativeness, which contributes to improving the quality of research in the area of epidemiology.

References

1. Rothman KJ. Greenland S, Lash TL. validity epidemiologic studies. En rothman kj. Greenland s, lash tl. Modern epidemiology. 3er ed. Philadelphia: wolters kluwer health\lippicontt williams &willkins;2008. P. 758

2. Morales, A.R., Zárate, L.E.M. Epidemiología clínica: investigación clínica aplicada, 1 ra ed. Colombia, editorial médica panamericana ;2004.p. 73

3. A. Rodríguez, A.R., Probabilidad e inferencia científica, 1ra ed, anthropos editorial, 1991; p. 33

4. Alegret rodríguez m, herrera m, grau abalo r. Las técnicas de estadística espacial en la investigación salubrista: caso síndrome de down. Rev cubana sal públ [revista en la internet]. 2008 [citado 13 de agosto de 2013];34(4). Disponible en: http://scielo.sld.cu/scielo.php?script=sci_arttext&pid=s0864-34662008000400003&lng=es

5. Lemus, jorge d., valentín aragües y oroz, and maría carmen lucioni. Administración hospitalaria y de organizaciones de atención de la salud: jorge d. Lemus, valentín aragües y oroz y maría carmen lucioni. 2a. Ed. ?pag 345-. Buenos aires: corpus, 2014.

6. Pina mf, ferreira alves s, correia ribeiro as, castro olhero a. Epidemiología espacial: nuevos enfoques para viejas preguntas. Univ odontol. 2010 jul-dic; 29(63):47-65. Available from: Disponible en http://www.javeriana.edu.co/ universitasodontologica.

7. Tobler wr. A computer movie simulating urban growth in the detriot region. Economic geography. 1970 jun; 46(suppl.): 234-40.

8.Ocaña-riola r, sánchez-cantalejo c. Información estadística y cartográfica de andalucía 2012; 2: 146-153.

9.Ocaña-riola r., mayoral jm, sánchez-cantalejo c, toro s, fernández a, méndez c:. Atlas interactivo de mortalidad en andalucía (aima). Revista española de salud pública 2008; 82: 379-394

10.Bailey tc. Spatial statistical methods in health. Cad saúde pública. 2001 sep-oct; 17(5): 1083-98.

11.Gutiérrez puebla, j., y gould, m. (1994): sic: sistemas de información geagráfica. Madrid, síntesis

12.Gatrell ac, bailey tc, diggle pj, rowlingson bs. Spatial point pattern analysis and its application in geographical epidemiology. Trans inst br geogr. 1996; 21:256-74.

13.Álvaro dávila, g.. (2018). Aplicación del muestreo sistemático en áreas rurales de poca accesibilidad de la amazonía ecuatoriana: el uso de la fotografía aérea en el muestreo sistemático. Revista universitaria de geografía, 27(1), 29-48. Recuperado en 30 de agosto de 2018, de http://www.scielo.org.ar/scielo.php?script=sci_arttext&pid=s1852-42652018000100003&lng=es&tlng=es.

14.Cantoni, r. M. (2009). Muestreo y determinación del tamaño de la muestra de investigación cuantitativa. Revista argentina de humanidades y ciencias sociales, 7(2). Recuperado de http://www.sai.com.ar/metodologia/rahycs/rahycs_v7_n2_06.htm

15.Mamani ortiz yercin, olivera quiroga vania, luizaga lopez marcela, illanes velarde daniel elving. Conocimientos y prácticas sobre lactancia materna en cochabamba-bolivia: un estudio departamental. Gac med bol [internet]. 2017 dic [citado 2018 ago 12] ; 40( 2 ): 12-21. Disponible en: http://www.scielo.org.bo/scielo.php?script=sci_arttext&pid=s1012-29662017000200004&lng=es.

16.Dodge, hf, y romig, hg (1929). ?un método de inspección de muestreo?. Bell system technical journal 8: 613-31. Disponible en : https://archive.org/details/bstj8-4-613

17.Piot, b., mukherjee, a., navin, d., krishnan, n., bhardwaj, a., sharma, v., & marjara, p. (2010). Lot quality assurance sampling for monitoring coverage and quality of a targeted condom social marketing programme in traditional and non-traditional outlets in india. Sexually transmitted infections, 86(suppl_1), i56?i61. http://doi.org/10.1136/sti.2009.038356

18.Robertson, s. (1996). Monitoreo de los servicios de inmunización utilizando la técnica de calidad del lote. Organización mundial de la salud: ginebra. Disponible en: https://extranet.who.int/ivb_docs/documents/1049

19.Robertson se, anker m, roisin aj, macklai n, engstrom k, laforce fm. The lot quality technique: a global review of applications in the assessment of health services and disease surveillance. World health stat q. 1997;50(3-4) 199-209. Pmid: 9477550

20.Desktop.arcgis.com. 2020. Qué es un shapefile?Ayuda | ArcGIS for Desktop. [online] Available at: <https://desktop.arcgis.com/es/arcmap/10.3/manage-data/shapefiles/what-is-a-shapefile.htm> [Accessed 1 August 2020].

21."Mrsid No I3geo - I3geo". Softwarepublico.Gov.Br, 2008, https://softwarepublico.gov.br/social/i3geo/historico-de-foruns/geral-usuarios-e-desenvolvedores/mrsid-no-i3geo. Accessed 01 Aug 2020.

22.«detalles de la extensión de archivo .dwg». Fileinfo - the file extension. Computer knowledge.

23.Haining r (2003), ?spatial data analysis: theory and practice.? cambridge university press.

24.Openshaw s (1983), ?the modifiable areal unit problem.? geobook ,spp. 127-144. Pion.

25.Kenneth e. Foote and donald j. Huebner. 1995. Error, accuracy, and precision. Department of geography, university of texas at austin. Recuperad el 5 de noviembre de 2014

26.Ferrero, f. Una solución alternativa al problema de la selección controlada, revista de economía y estadística, tercera época, vol. 16, trimestre, pp. 47-64.disponible en https://revistas.unc.edu.ar/index.php/reye/article/view/3683

27.García gonzález, juan antonio y cebrián abellán, francisco. La interpolación como método de representación cartográfica para la distribución de la población: aplicación a la provincia de albacete. En: xii congreso nacional de tecnologías de la información geográfica. Granada, 2006. P. 1923. Disponible en: http://www.age-geografia.es/tig/docs/xii_1/012%20-%20garcia%20y%20cebrian.pd

Author notes

* Correspondence to:Angela Marion Zambrana Vera

E-mail:angelamzv@gmail.com

angelamzv@gmail.com

Conflict of interest declaration

The authors declare that they have no conflicts of interest.

Additional information

Acknowledgements: For project funding, the institutional support programme of AI-BELGIUM ARES/CCD and the scholarship programme of the Swedish International Development Cooperation Agency (SIDA). For the multidisciplinary work, the Centro de Levantamiento Aeroespacial y aplicaciones en sistemas de información geográfica (CLAS) and the Instituto de Investigaciones Biomédicas e Investigación Social IIBISMED of the Universidad Mayor de San Simón, Cochabamba-Bolivia. And to all the staff involved in the process

Alternative link

HTML generated from XML JATS4R by