COMPUTING SYSTEMS AND NETWORKS
DATA PROCESSING AND ANALYSIS
E.G. Zhilyakov, A.S. Belov, S.P. Belov, A.A. Medvedeva Detection of Pauses Between Word Fragments of Speech Recordings
CONTROL AND DECISION-MAKING
SOFTWARE ENGINEERING
E.G. Zhilyakov, A.S. Belov, S.P. Belov, A.A. Medvedeva Detection of Pauses Between Word Fragments of Speech Recordings
Abstract. 

The paper considers the problem of segmentation of recordings of speech signals into segments generated in the presence of speech (word segments), and the pauses between them. This segmentation is an important stage in the identification of speech components based on some features. It is assumed that the segments of the speech signal in pauses of speech are samples from a stationary  sequence of samples (noise in pauses). As the main characteristic of noises in pauses, it is proposed to use estimates from the training sample of the mathematical expectations of the energy parts of their  segments of a certain finite duration in predetermined frequency bands (subband analysis). It is shown  that the use of the maximum ratio of the energy parts of the current analyzed segment to the corresponding mathematical expectations segments of noise allows you to take into account the possible presence of a speech component to the maximum extent. This effect is equivalent to maximizing the signal-to-noise ratio, that is, the proposed decision function is optimal in this sense. 

Keywords: 

segmentation of speech recordings, subband analysis, optimal decision function. 

PP. 40-46.

DOI 10.14357/20718632220105 
 
References

1. Components Google and Yndex Alice: a voice assistant from Yandex Corporate block of Yandex, Access mode: https://yandex.ru/blog/company/alisa (October 10, 2017)
2. Shelukhin O.I. and N.F. Lukyantsev 2000. Digital processing and transmission of speech. Moscow: Radio and communication.456 p.
3. Agranovskiy A.V. and D.A. Lednov 2004. Theoretical aspects of algorithms for processing and classification of speech signals. Moscow: Radio and communication.164 p.
4. Dvorkovich V.P. and Dvorkovich A.V. 2012. Digital video information systems: (theory and practice): a practical guide. Moscow: Technosphere. 1008 p.
5. Richter S.G. 2011. Coding and transmission of speech in digital systems. Moscow: Hot Line – Telecom. 304 p.
6. Pitman E. 1986. Foundations of the theory of statistical inferences. Moscow: Mir. 104 p.
7. Gorelik A.L. and V.A. 2004. Skripkin Recognition methods. Moscow: Higher school. 260 p.
8. Khurgin Ya.I. and V.P. Yakovlev 1971. Finite functions in physics and technology. Moscow: Nauka. 408 p.
9. Zhilyakov, E.G. 2015. Optimal Subband Methods for Analysis and Synthesis of Signals of Finite Duration [Avtomat. and telemech.] 4 : 51–66. Autom. Remote Control, 76: 4 (2015), 589-602
10. Gantmakher F.R. 1966. Matrix theory. Moscow: Nauka. 576 p.
 
2024 / 01
2023 / 04
2023 / 03
2023 / 02

© ФИЦ ИУ РАН 2008-2018. Создание сайта "РосИнтернет технологии".