Snow Mountain: Dataset of Audio Recordings of The Bible in Low Resource Languages. (arXiv:2206.01205v1 [eess.AS])

Automatic Speech Recognition (ASR) has increasing utility in the modern
world. There are a many ASR models available for languages with large amounts
of training data like English. However, low-resource languages are poorly
represented. In response we create and release an open-licensed and formatted
dataset of audio recordings of the Bible in low-resource northern Indian
languages. We setup multiple experimental splits and train and analyze two
competitive ASR models to serve as the baseline for future research using this



