Speech and Computer - 26th International Conference, SPECOM 2024, Belgrade, Serbia, November 25-28, 2024, Proceedings, Part I - researchr publication

researchr

You are not signed in
Sign in
Sign up

Alexey Karpov 0001, Vlado Delic, editors, Speech and Computer - 26th International Conference, SPECOM 2024, Belgrade, Serbia, November 25-28, 2024, Proceedings, Part I. Volume 15299 of Lecture Notes in Computer Science, Springer, 2025. [doi]

Conference: specom2025

Abstract is missing.

Preserving Language Heritage Through Speech Technology: The Case of Upper SorbianIvan Kraljevski, Frank Duckhorn, Daniel Sobe, Constanze Tschöpe, Matthias Wolff. 3-22 [doi]

Retrospective and Perspectives of TTS & STT Technology Development and Implementation for South Slavic Under-Resourced LanguagesMilan Secujski, Branislav M. Popovic, Darko Pekar, Niksa Jakovljevic, Edvin Pakoci, Sinisa Suzic, Tijana V. Nosek, Nikola Simic, Vuk Stanojev, Vlado Delic. 23-42 [doi]

Comparison of Well and Lower-Resourced Self-training in ASRYue Luo, Péter Mihajlik. 45-56 [doi]

Towards a Livvi-Karelian End-to-End ASR SystemIrina S. Kipyatkova, Ildar Kagirov, Mikhail Dolgushin, Alexandra Rodionova. 57-68 [doi]

Advances in OpenASR21 Evaluation with Increased Temporal Resolution for Speech Self-supervised Learning ModelsVishwa Gupta. 69-81 [doi]

Benchmarking Whisper Under Diverse Audio Transformations and Real-Time ConstraintsSergei Katkov, Antonio Liotta, Alessandro Vietti. 82-91 [doi]

AutoMode-ASR: Learning to Select ASR Systems for Better Quality and CostAhmet Gunduz, Yunsu Kim 0005, Kamer Ali Yuksel, Mohamed Al-Badrashiny, Thiago Castro Ferreira, Hassan Sawaf. 92-103 [doi]

Pre-training and Adverse Audio Samples for Data-Efficient Wake Word DetectionManuel Torralbo, Ariane Méndez, Maia Agirre, Arantza del Pozo. 104-118 [doi]

Cross-Lingual Summarization of Speech-to-Speech Translation: A BaselinePranav Karande, Balaram Sarkar, Chandresh Kumar Maurya. 119-133 [doi]

The ParlaSpeech Collection of Automatically Generated Speech and Text Datasets from Parliamentary ProceedingsNikola Ljubesic, Peter Rupnik, Danijel Korzinek. 137-150 [doi]

ESC Corpus of Spoken Russian: Everyday Student Conversations Captured Through Continuous Speech Recording in Natural Communicative EnvironmentsTatiana Y. Sherstinova, Irina Petrova 0001. 151-162 [doi]

OpenAV: Bilingual Dataset for Audio-Visual Voice Control of a Computer for Hand Disabled PeopleDenis Ivanko, Dmitry Ryumin, Alexandr Axyonov, Alexey M. Kashevnik, Alexey Karpov 0001. 163-173 [doi]

Bulgarian Speech Resources in the CHILDES SystemVelka Popova, Dimitar Popov. 174-186 [doi]

Multiword Units in Russian Everyday Speech: Empirical Classification and Corpus-Based StudiesNatalia Bogdanova-Beglarian, Olga Blinova, Maria Khokhlova, Tatiana Y. Sherstinova, Tatiana I. Popova. 187-200 [doi]

Neurophysiological Correlates of Textual Modulation in Visual Stimuli: An Experimental Study of Russian and English MemesRodmonga Potapova, Vsevolod Potapov, Ekaterina Karimova, Leonid Motovskikh, Nikolay Bobrov. 201-215 [doi]

End-to-End Speech Synthesis for the Serbian Language Based on TacotronTijana V. Nosek, Sinisa Suzic, Milan Secujski, Vuk Stanojev, Darko Pekar, Vlado Delic. 219-229 [doi]

ChildTinyTalks (CTT): A Benchmark Dataset and Baseline for Expressive Child Speech SynthesisShaimaa Alwaisi, Mohammed Salah Al-Radhi, Géza Németh. 230-240 [doi]

Multidimensional Rhythm: Comparing Rhythmic Properties of Australian and New Zealand MonologuesAnna Borzykh, Tatiana Shevchenko. 241-250 [doi]

Influence of Linguistic and Sociolinguistic Factors on Speech Rate PerceptionAnastasia Ananeva, Uliana E. Kochetkova. 251-264 [doi]

Human and Machine Keyphrase Perception in Russian Text and SpeechDaria Guseva, Olga Mitrofanova, Mikhail Dolgushin. 265-280 [doi]

Assessment of Children's Ability to Manifest Emotions in Facial Expressions, Voice and Speech by Humans, Automatic, and on a Likert ScaleElena E. Lyakso, Olga V. Frolova, Anton Matveev, Aleksandr Nikolaev, Ruban Nersisson. 281-294 [doi]

Investigating the Utility of wav2vec 2.0 Hidden Layers for Detecting Multiple SclerosisGábor Gosztolya, László Tóth 0001, Veronika Svindt, Judit Bóna, Ildikó Hoffmann. 297-308 [doi]

Cross-Cultural Automatic Depression Detection Based on Audio SignalsDanila Mamontov, Sebastian Zepf, Alexey Karpov 0001, Wolfgang Minker. 309-323 [doi]

Depression Classification Using Token Merging-Based Speech Spectrotemporal TransformerLokesh Kumar, Kumar Kaustubh, S. R. Mahadeva Prasanna. 324-335 [doi]

Detecting Depression from Audio DataMary Idamkina, Andrea Corradini. 336-351 [doi]

Binary and Multiclass Classification of Dysphonia Using Whisper Encoder and One-Dimensional Convolutional Neural NetworkDosti Aziz, Dávid Sztahó. 352-366 [doi]

Approach to Assessing the Quality of Syllable Pronunciation by Patients in the Process of Speech Rehabilitation Based on Comparison with Healthy SpeakersGerman Egle, Dariya Novokhrestova, Svetlana Tomilina, Evgeny Kostyuchenko. 367-376 [doi]

A Comparative Study for Contextualized Spoken Answer Classification in German Medical QuestionnairesPhilipp L. Harnisch, Daniel Schuhmann, Stefan Hillmann. 377-391 [doi]

runs on WebDSL