Analysis and Classification of the Rusyn Language Using the OpenAI Whisper ASR Model
DOI:
https://doi.org/10.12797/RRB.20.2024.20.10Keywords:
Rusyn language, Phonetics, Classification, Assimilation, AI, ANN, ASRAbstract
ANALYSIS AND CLASSIFICATION OF THE RUSYN LANGUAGE USING THE OPENAI WHISPER ASR MODEL
The paper presents a linguistic analysis of the Rusyn language, focusing on its complex and dynamic aspects, such as pronunciation and individual, regional, and historical variations. The study employed a neural network based on the OpenAI Whisper automatic speech recognition (ASR) model. While trained on data from a majority of official state languages, the model lacked direct training on Rusyn language samples due to its localized and minority/ethnic nature. Consequently, speech samples were classified using the closest available language labels, enabling the identification of similarities between Rusyn and other Slavic languages. The study encompassed a diverse range of speakers across gender, age, and location (Poland, Ukraine, Slovakia, Serbia), revealing significant similarities to the dominant languages in these respective countries. Furthermore, the research highlights correlations between the identified linguistic similarities and the age of the speakers.
References
Bouamor, Houda, Hassan, Sabit, Habash, Nizar. 2019. «The MADAR Shared Task on Arabic Fine-Grained Dialect Identification». В: Proceedings of the Fourth Arabic Natural Language Processing Workshop. Ред. Wassim El-Hajj, Lamia Hadrich Belguith, Fethi Bougares, Walid Magdy, Imed Zitouni, Nadi Tomeh, Mahmoud El-Haj, Wajdi Zaghouani, 199–207. Florence: Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-4622.
Kushko, Nadiya. 2007. «Literary Standards of the Rusyn Language: The Historical Context and Contemporary Situation». The Slavic and East European Journal 51, ч [č]. 1: 111–132.
Moser, Michael. 2016. «Rusyn: A New-Old Language In-between Nations and States». В: The Palgrave Handbook of Slavic Languages, Identities and Borders. Ред. Tomasz Kamusella, Motoki Nomachi, Catherine Gibson, 124–139. London: Palgrave Macmillan. https://doi.org/10.1007/978-1-137-34839-5_7.
Nikitin, Alexey G., Kochkin, Igor T., June, Cynthia M., Willis, Catherine M., Mcbain, Ian, Videiko, Mykhailo Y. 2009. «Mitochondrial DNA Sequence Variation in the Boyko, Hutsul, and Lemko Populations of the Carpathian Highlands». Human Biology 81, ч [č]. 1: 43–58. https://doi.org/10.3378/027.081.0104.
Plišková, Anna. 2008. «Practical Spheres of the Rusyn Language in Slovakia». Studia Slavica Academiae Scientiarum Hungaricae 53, ч [č]. 1: 95–115. https://doi.org/10.1556/SSlav.53.2008.1.6.
Rabus, Achim, Scherrer, Yves. 2017. «Lexicon Induction for Spoken Rusyn – Challenges and Results». В: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing. Ред [Red]. Tomaž Erjavec, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, Roman Yangarber, 27–32. Valencia: Association for Computational Linguistics. https://doi.org/10.18653/v1/W17-1405.
Radford, Alec, Kim, Jong Wook, Xu, Tao, Brockman, Greg, McLeavey, Christine, Sutskever, Ilya. 2023. «Robust Speech Recognition via Large-Scale Weak Supervision». В: Proceedings of the 40th International Conference on Machine Learning (ICML’23). Ред [Red]. Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan, 1–28 (28492–28518). Honolulu: JMLR.org.
Rahate, Anil, Walambe, Rahee, Ramanna, Sheela, Kotecha, Ketan. 2022. «Multimodal Co-learning: Challenges, Applications with Datasets, Recent Advances and Future Directions». Information Fusion 81: 203–239. https://doi.org/10.1016/j.inffus.2021.12.003.
Scherrer, Yves, Rabus, Achim. 2019. «Neural Morphosyntactic Tagging for Rusyn». Natural Language Engineering 25, ч [č]. 5: 633–650. https://doi.org/10.1017/S1351324919000287.
Zampieri, Marcos, Nakov, Preslav, Scherrer, Yves. 2020. «Natural Language Processing for Similar Languages, Varieties, and Dialects: A Survey». Natural Language Engineering 26, ч. 6: 595–612. https://doi.org/10.1017/S1351324920000492.