Artificial Generation of Realistic Voices
(1) Mumbai University
(2) Mumbai University
(3) Mumbai University
(4) Mumbai University
(5) Mumbai University
(*) Corresponding Author
Abstract
In this paper, we propose an end-to-end text-to-speech system deployment wherein a user feeds input text data which gets synthesized, variated, and altered into artificial voice at the output end. To create a text-to-speech model, that is, a model capable of generating speech with the help of trained datasets. It follows a process which organizes the entire function to present the output sequence in three parts. These three parts are Speaker Encoder, Synthesizer, and Vocoder. Subsequently, using datasets, the model accomplishes generation of voice with prior training and maintains the naturalness of speech throughout. For naturalness of speech we implement a zero-shot adaption technique. The primary capability of the model is to provide the ability of regeneration of voice, which has a variety of applications in the advancement of the domain of speech synthesis. With the help of speaker encoder, our model synthesizes user generated voice if the user wants the output trained on his/her voice which is feeded through the mic, present in GUI. Regeneration capabilities lie within the domain Voice Regeneration which generates similar voice waveforms for any text.
Full Text:
PDFReferences
L. Wan, Q. Wang, A. Papir and I. L. Moreno, “Generalized end-to-end loss for speaker verification.” Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference, 2018.
Y. Jia, Y. Zhang, R. J. Weiss, Q. Wang, J. Shen, F. Ren, Z. Chen, P. Nguyen, R. Pang, I. L. Moreno and Y. Wu, “Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis.” Advances in neural information processing systems, 31, 4485–4495, 2018.
Artificial Intelligence at Google – Our Principles. https://ai.google/principles/, 2018.
A.V.D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior and K. Kavukcuoglu. “Wavenet: A generative model for raw audio.” arXiv preprint 1609.03499, (2016).
J. Shen, R. Pang, Ron J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. J. Skerry-Ryan, R. A. Saurous, Y. Agiomyrgiannakis and Y. Wu. “Natural TTS synthesis by conditioning WaveNet on mel spectrogram predictions.” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2018.
M. Grechanik, Q. Xie and C. Fu, “Creating GUI testing tools using accessibility technologies.” Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops, 243–250, 2009.
DOI: https://doi.org/10.24071/ijasst.v3i1.2744
Refbacks
- There are currently no refbacks.
Publisher : Faculty of Science and Technology
Society/Institution : Sanata Dharma University
This work is licensed under a Creative Commons Attribution 4.0 International License.