Building a gold standard dataset to identify articles about geographic information science

López-Vázquez, Carlos - Hochsztain, Esther - Resnichenko, Yuri

Resumen:

To know the overall regional or international scientific production is of vital importance to many areas of knowledge. Nevertheless, in interdisciplinary areas such as Geographic Information Science (GISc) it is not enough to just count papers published in specific journals. Most of them, as is the case of the International Journal of Remote Sensing (IJRS), welcome GISc papers but are not exclusive to that area so the production assignable to authors in the region must consider not only affiliation but also whether or not each paper falls into the theme of GISc. IJRS publishes far more papers than any other GISc journal, so it is important to assess quantitatively how many of them are of GISc. In this work, a representative sample of IJRS articles published over a period of almost 30 years was analyzed using a specific GISc definition. With these data, a manual classification methodology through a set of experts was carried out, and a dataset was built, analyzed, and statistically tested. As a result we estimate that between 47 and 76% of the IJRS articles can be considered from GISc, with a confidence level of 95%. Aside from the primary goal, this set could be used as a gold standard for future classification tasks. It constitutes the first GISc dataset of this kind, that may be used to train artificial intelligence systems capable of performing the same classification automatically and in a massive way. A similar procedure could be applied to other interdisciplinary fields of knowledge as well.


Detalles Bibliográficos
2022
Gold standard
Manual classification
Indexer consistency
Geographic information science
Inglés
Universidad de la República
COLIBRI
https://hdl.handle.net/20.500.12008/39730
Acceso abierto
Licencia Creative Commons Atribución (CC - By 4.0)