Abstract:
The article is devoted to the development of a search and indexing system for audio files using Automatic Speech Recognition (ASR) and Elasticsearch. Current Russian-language audio file transcription systems have been analyzed, and Whisper has been chosen as the best one. An algorithm for optimizing transcription speed using parallelization of file processing processes has been developed, and its effectiveness has been demonstrated. A microservice architecture-based system has been built, capable of indexing audio file content and their metadata for search purposes. The research results show that the proposed approach can be applied to create efficient and flexible systems for searching and analyzing audio information.