Development of a System for Searching and Indexing the Content of Audio Recordings

Main Article Content

Abstract

The article is devoted to the development of a search and indexing system for audio files using Automatic Speech Recognition (ASR) and Elasticsearch. Current Russian-language audio file transcription systems have been analyzed, and Whisper has been chosen as the best one. An algorithm for optimizing transcription speed using parallelization of file processing processes has been developed, and its effectiveness has been demonstrated. A microservice architecture-based system has been built, capable of indexing audio file content and their metadata for search purposes. The research results show that the proposed approach can be applied to create efficient and flexible systems for searching and analyzing audio information.

Article Details

References

1. AWS Kendra Transcribe Media Search. URL: https://github.com/aws-samples/aws-kendra-transcribe-media-search
2. Noor J., Rownak A., Ratul R., Mondal J. Sherlok in OSS: A Novel Approach of Content-Based Searching on Object Storage System. 2023. URL: https://arxiv.org/pdf/2303.02105.pdf.
3. Swift Object Storage. URL: https://www.openstack.org/software/releases/zed/components/swift
4. Adrakatti A., Mulia K.R. Research Challenges of Library and Information Science in retrieving content based Multimedia Information. 2023. URL:https://www.researchgate.net/publication/361107734_Research_Challenges_of_Library_and_Information_Science_in_retrieving_content_based_Multimedia_Information.
5. Google Speech. URL: https://console.cloud.google.com/speech/overview.
6. Vosk. URL: https://github.com/alphacep/vosk.
7. Yandex SpeechKit. URL: https://cloud.yandex.com/en/services/speechkit.
8. Whisper. URL: https://github.com/openai/whisper.
9. Подопригорова Н. С., Подопригорова С. С., Кан А. Д. Автоматическое распознавание речи в системе информационного поиска по аудио // Искусственный интеллект в автоматизированных системах управления и обработки данных, Московский государственный технический университет имени Н.Э. Баумана (национальный исследовательский университет). 2022. Т. 2. С. 339–345.
10. Morris A., Maier V., Green P. From WER and RIL to MER and WIL. 2004. URL:https://www.isca-speech.org/archive_v0/archive_papers/interspeech_2004/i04_2765.pdf.
11. JiWER: A Simple and Fast Python Package to Evaluate an Automatic Speech Recognition System. URL: https://github.com/jitsi/jiwer
12. Whisper.cpp. URL: https://github.com/ggerganov/whisper.cpp
13. Faster-whisper. URL: https://github.com/guillaumekln/faster-whisper
14. CTranslate2. URL: https://github.com/OpenNMT/CTranslate2/
15. Prompt vs prefix in DecodingOptions. URL: https://github.com/openai/whisper/discussions/117
16. FFmpeg. URL: https://ffmpeg.org/
17. ElasticSearch. URL: https://www.elastic.co/
18. ElasticSearch More like this query URL: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html