Abstract:
The provision of portals that serve as a source of access and availability of public domain data is part of the adoption of public policies that some government entities have implemented in response to the establishment of an open, transparent, multidirectional, collaborative and focused on citizen participation government, both in monitoring and in making public decisions. However, the publication of this data must meet certain characteristics to be considered open and of quality. For this reason, studies arise that focus on the approach of methodologies and indicators that measure the quality of the portals and their data. For the aim of this paper, the search of referential sources of the last six years regarding the evaluation of data quality and open data portals in Spain, Brazil, Costa Rica, Taiwan and the European Union was carried out with the objective of gathering the necessary inputs for the approach of the methodology presented in the document.
Keywords:data portalsdata portals,data qualitydata quality,evaluation methodologiesevaluation methodologies,metadatametadata,open dataopen data,open data portalsopen data portals.
Resumen:
La disposición de portales que sirven como fuente de acceso y disponibilidad de datos de dominio público forma parte de la adopción de políticas que algunas entidades gubernamentales han implementado como respuesta a la instauración de un gobierno abierto, transparente, multidireccional, colaborativo y orientado a la participación de los ciudadanos, tanto en el seguimiento como en la toma de decisiones públicas. Sin embargo, la publicación de estos datos debe cumplir con ciertas características para considerarse abiertos y de calidad. Por este motivo surgen estudios que se enfocan en el planteamiento de metodologías e indicadores que miden la calidad de los portales y de sus datos. Para fines de esta investigación se llevó a cabo la búsqueda de fuentes referenciales de los últimos seis años acerca de la evaluación de la calidad de datos y de portales de datos abiertos en España, Brasil, Costa Rica, Taiwán y la Unión Europea, con el objetivo de reunir los elementos necesarios para el planteamiento de la metodología que se presenta en el documento.
Palabras clave: calidad de datos, datos abiertos, metadatos, metodologías de evaluación, portales de datos, portales de datos abiertos.
Resumo:
A disposição de portais que servem como fonte de acesso e disponibilidade de dados de domínio público forma parte da adoção de políticas que algumas entidades governamentais têm implementado como resposta à instauração de um governo aberto, transparente, multidireccional, colaborativo e orientado à participação dos cidadãos, tanto no seguimento como na tomada de decisões públicas. Porém, a publicação destes dados deve cumprir com certas características para considerar-se abertos e de qualidade. Por este motivo surgem estudos que se enfocam na abordagem de metodologias e indicadores que meçam a qualidade dos portais e de seus dados. Para fins desta pesquisa realizou-se a busca de fontes referenciais dos últimos seis anos acerca da avaliação da qualidade de dados e de portais de dados abertos na Espanha, Brasil, Costa Rica, Taiwan e na União Europeia, com o objetivo de reunir os elementos necessários para a abordagem da metodologia que se apresenta no documento.
Palavras-chave: qualidade de dados, dados abertos, metodologias de avaliação, portais de dados.
Papers
Proposal for the Evaluation of Open Data Portals
Propuesta para la evaluación de portales de datos abiertos
Proposta para a avaliação de portais de dados abertos
Received: 06 August 2019
Accepted: 18 October 2019
Published: 31 October 2019
Open data helps government institutions disseminate information of interest to civil society in order to provide transparency and social control, and thus, empower citizens through information access, to the point that today this philosophy of openness has transcended to other areas such as academia and research institutes, who seek the development and improvement of services, plans, programs, projects and standards with the collaborative participation between state-citizen-company.
The “open” data must have technical and legal characteristics to be used, reused and redistributed by any person or entity, without any restriction; These parameters are stipulated in the International Open Data Charter [1].
In favor of this initiative, in some countries standards and portals have been implemented in order to contribute to its use; For example, in Colombia, Law 1712 of 2014 obliges all public entities to disclose their data, and since 2016, the nation adopted the principles established in the International Open Data Charter, making the Colombian State Data Portal available as a space for the dissemination of public information in the country [2]. Likewise, portals were created at the departmental and municipal levels, with the objective that each entity had its own space for data opening. In 2019, a total of 30 portals focused on the dissemination and access of open data were registered.
However, having quality open data portals implies that they fulfill a dynamic role in the data life cycle and that they establish a relationship between producers, publishers and data consumers, through interaction mechanisms that contribute to aspects such as identification of the demand for data, data publication of interest for specific users, the feedback of data sets and the portal, as well as the improvement of their quality.
At international level there are several proposals from experts in the area for the evaluation of open data portals, each with different dimensions, factors or aspects to carry out this process. Therefore, and given that the portal is the means by which the quality of the published data is guaranteed, the search is facilitated by the users, the data is available in usable formats and these are published so that respond to a specific demand in order to meet specific needs that generate value, for which they have an integral evaluation methodology with criteria and dimensions proposed by experts in preliminary work.
In order to have the necessary basis for the formulation of an evaluation proposal that covers the different perspectives, as well as to create a wider and complete evaluation mechanism, a documentary review of works and research was developed in the recent years for portals evaluation.
In most of the works, an implementation of the Tim Berners-Lee five-star model was found, where it is proposed to evaluate the opening of data from its accessibility and reuse through five levels, represented by stars, that evaluate: 1. If data is only published in any format under an open license, 2. If data is structured, 3. If they are in non-proprietary formats, 4. If URI is used to access specific data directly, and 5. If they are linked to other data generating context [3].
In the case of the Open Data portal of Barcelona, the authors evaluated the quality of the portal data according to its reuse, they complemented the five-star model with the proposal to include factors such as the frequency of updating and geolocation of the data and related the amount of download and themes, according to the number of stars obtained with the model [4]. Similar case to the evaluation of portals of the European Union, where relevance is given to the analysis of the state of the data sets and the standards in which they were published at the time for the implementation of recommendations and the general improvement of portals [5].
In the case of the Barcelona Open Data Portal, the authors evaluated the quality of the portal data according to its reuse, complemented the five-star model with the proposal to include factors such as the frequency of updating and geolocation of the data and related the amount of download and themes, according to the number of stars obtained with the model [4]. Similar case is the one of the European Union, portal evaluation, where relevance is given to the analysis of the state of the data sets and the standards in which they were published at the time for the implementation of recommendations and the general improvement of portals [5].
Although it is evidenced in other works that more robust portals evaluation models are proposed that complement, to a large extent, the model proposed by Berners-Lee, enriching aspects of data and portal quality [6], as well as the Using indicators proposed by organizations such as the Open Knowledge Foundation (OKF) for open data programs [7], it is considered that there are factors that are left out of the scope of the study or not covered in depth, but are necessary for the evaluation of the quality of data and portals, for example, the evaluation of metadata and communication channels offered by the portals.
As for the other studies, there are methodologies such as Meloda, which is used for the exclusive evaluation of data reuse [8]; the evaluation of metadata from its use, availability, completeness, openness and addressability [9]; the analysis of the structural composition of the portal based on its conformation and categorization [10], and the evaluation of national portals through the general characteristics of the portals and the data set [11].
Among the methodologies, models and standards of found evaluation, those presented in Table 1 stand out.
Table 2 shows a consolidation of the dimensions measured by each of the methodologies described in Table 1.
Taking as reference the methodologies presented in Table 1, the evaluation of open data from two approaches is proposed: 1) Published data, covering quality, use and metadata, and 2) Portal, highlighting aspects of its structure, usability and communication mechanisms. Each dimension is composed of several factors, whose general criteria are explained in Table 3.
As part of the proposed methodology, a quantitative measurement system is proposed with the objective of scoring each of the presented criteria (Table 3). Each approach, portal and data has a maximum score of 100 points, distributed as shown in Table 4. The final score will be:
𝑆𝑐𝑜𝑟𝑒=(𝑑𝑎𝑡𝑎 𝑠𝑐𝑜𝑟𝑒∗0.6)+(𝑃𝑜𝑟𝑡𝑎𝑙 𝑠𝑐𝑜𝑟𝑒∗0.4)
That is, the score obtained when evaluating the data will be equivalent to 60% of the score, and the portal score will have an equivalence of 40%. Although some of the criteria proposed may have qualitative considerations, the methodology proposes a quantitative approach to the evaluation of factors, with the objective of responding to the use of indicators to evaluate open data initiatives, as organizations such as the World Wide Web Foundation with the Open Data Barometer, or the Open Knowledge Foundation with the Global Open Data Index.
The maximum score to be obtained in each criterion that makes up each factor is presented in the boxes in Table 4. These criteria are related to the data:
1. Availability:
A) The set is available for viewing.
B) The data set can be used without any restrictions.
C) You have access to the data that was completed based on requests for completeness and improvement.
2. Upgrade:
A) There is a record of the updating periodicity of the data set.
B) The data set is updated with time according to the subject and purpose of its publication.
3. Accessibility:
A) The data set is downloadable.
B) It is possible to access the data set through an API.
4. Visualization:
A) Data is available in tables or other graphic representation that allows a better understanding of the whole.
B) Data can be exported to different formats that allow its use.
5. Publication formats:
A) Data is in non-proprietary formats.
B) Data is in machine-processable formats that allows its use.
6. Completeness:
A) Data set has a sufficient number of records for studies and analysis.
B) It does not present empty or null fields.
C) The fields are consistent with the objective of the columns, maintaining consistency with the whole set.
1. Defined demand: it is clearly known to whom the data is directed.
2. Number of visualizations: it is possible to determine the number of people who have visualized the data set.
3. Download:
A) The data set has been downloaded at least once.
B) The data set has a significant average discharge.
4. API: queries can be made through parameterizable addresses that allow obtaining specific fields of a data set.
5. Resulting products:
A) The data reference products and applications derived from the use of the set.
B) The use of data set for the creation of products and services is in detail.
C) Data reference graphs and reports made by users.
1. Use: metadata is used to detail the characteristics of the data sets.
2. Completeness:
A) The public to whom the group is directed is explicitly defined, as well as its purpose.
B) The purpose of the fields is defined in detail and without exceptions.
C) The contact information of the data author is available.
3. Recoverability:
A) The topic to which the data belongs is specifically defined, allowing it to be related to similar sets or of the same categorization axis.
B) Keywords are specified that allow the rescue of the data set in subsequent searches.
In relation to the portal:
Structure:
1. Categorization: data sets are consistent with respect to similarity with other sets that are classified in the same category.
1. Search
A) Searches by entities or publishers are available.
B) It is possible to search for data through themes, topics or categorization.
C) You can search for periods that allow you to obtain data from a specific time.
2. Navigability:
A) The portal has a navigation map available to users, where the structure of the site is evidenced.
B) The portal has a simple navigability that allows users to scroll through the portal and find information quickly.
C) The portal implements different elements to facilitate navigability in the system, such as: help buttons, contact buttons, navigation bars, a general menu.
3. Use / consumption / data download:
A) The portal offers the possibility to visualize data in order to facilitate its analysis and understanding.
B) It is possible to download the data from the portal in different formats that allow its versatility of use, without any restriction.
C) The portal makes available to users at least one API that allows the consumption and consultation of data.
D) The portal offers statistics about the users use of data.
1. Comments and discussion:
A) It is possible to comment on the data sets at their place of publication.
B) The portal has spaces where users can deal with topics related to the data available on the portal.
C) The portal provides support mechanisms between users through forum-like spaces.
D) A space is offered for users to view and learn about the resulting products from the use of data published on the portal.
2. Source-user:
A) The publisher is notified when comments are received about the data sets he has published.
B) Users are notified when the data sets on which they showed interest are updated or modified.
C) In the portal there is the contact information of the entities or publishers.
D) The portal offers direct communication between the publisher and the end user, contributing to the improvement of data quality.
3. Requests:
A) Users can make direct requests for specific data sets through the portal.
B) Users are notified when there is a response to their request.
In case the score gives a decimal value, it must be adjusted by rounding. Next, Table 5 shows the scores with their corresponding classification.
If, when evaluating a portal, the data score was 50 points and that of the portal was 63 points, the following would be obtained:
𝑆𝑐𝑜𝑟𝑒=(50∗0.6)+(63∗0.4)
𝑆𝑐𝑜𝑟𝑒=30+25.2=55.2
According to the classification proposed in the methodology, the portal would have an acceptable quality.
With the aim of evaluating the methodology, it was applied in the open data portal provided by the Colombian government (https://www.datos.gov.co/), based on the experience of a group of users, both experts as inexperienced. The qualification obtained is presented in Table 6, which also summarizes the main aspects that justify the evaluation of each factor or criterion.
All the above, gives the portal the following score:
𝑆𝑐𝑜𝑟𝑒=(48∗0.6)+(59∗0.4)
𝑆𝑐𝑜𝑟𝑒=52.4
Consequently, according to Table 5, the portal would have an acceptable rating, which indicates that, although it has different functionalities, it is necessary to add control points that provide greater satisfaction to the end user, eliminating sets that do not comply with minimum quality conditions or allowing to qualify a set by users.
The use of methodologies and models to determine the quality of the data contributes to the improvement of these, based on the identification of the status and flaws that may occur, also helps the continuation of the life cycle of open data, whose processes are in constant improvement.
Each methodology provides a different approach to the extent that its evaluation criteria is raised, which may lead to the studied element (portal or data) having different quality levels, depending on the used methodology. However, it is not unknown that the approach to a more real quality result is given by the combination and complement of methodologies and models that allow a greater number of aspects to be covered.
Open data portals play an important role in data opening initiatives, since they are the main point of access and availability of data, mainly published by government entities, which is why the quality of the data, of the structure of the data portal and the characteristics it provides to its users, can determine its level of use, impact and reputation; This is why the responsibility of the portals also lies in their constant improvement to offer users the highest possible quality.
When interacting with the Open Data portal of the Colombian State, it has been found that there are a large number of data sets available, but that many of them present inconsistencies or other flaws that hinder their use, which evidences the need to evaluate the portal with regarding its data and structure, since this type of aspects may raise the question about the use of portal resources.
Herrera-Melo carried out the process of documentary review and the evaluation of the state of the art of the evaluation methodologies. González-Sanabria validated the proposed methodology and applied it to the study case. Both authors wrote and validated the document.
The authors have declared that no competing interests exist.
Citation: C.-A. Herrera-Melo, J.-S. González-Sanabria, “Proposal for the Evaluation of Open Data Portals,” Revista Facultad de Ingeniería, vol. 29 (54), e10194, 2020. https://doi.org/10.19053/01211129.v29.n0.2020.10194