A FAIR and AI-ready Higgs Boson Decay Dataset. (arXiv:2108.02214v1 [hep-ex])

To enable the reusability of massive scientific datasets by humans and
machines, researchers aim to create scientific datasets that adhere to the
principles of findability, accessibility, interoperability, and reusability
(FAIR) for data and artificial intelligence (AI) models. This article provides
a domain-agnostic, step-by-step assessment guide to evaluate whether or not a
given dataset meets each FAIR principle. We then demonstrate how to use this
guide to evaluate the FAIRness of an open simulated dataset produced by the CMS
Collaboration at the CERN Large Hadron Collider. This dataset consists of Higgs
boson decays and quark and gluon background, and is available through the CERN
Open Data Portal. We also use other available tools to assess the FAIRness of
this dataset, and incorporate feedback from members of the FAIR community to
validate our results. This article is accompanied by a Jupyter notebook to
facilitate an understanding and exploration of the dataset, including
visualization of its elements. This study marks the first in a planned series
of articles that will guide scientists in the creation and quantification of
FAIRness in high energy particle physics datasets and AI models.

Source: https://arxiv.org/abs/2108.02214

webmaster

Related post