I am a recent graduate of Brooklyn College with a bachelor's degree in Computer Science. Last summer, I participated in the DREU program and was matched with Professor Julia Hirschberg, the chair of the Computer Science Department at Columbia University. Under the guidance of PhD student Erica Cooper, I worked in the Spoken Language Lab where I contributed to a project in Text to Speech. My work involved processing audio and text files to prepare them for speech synthesis. In addition to this, I collected web data for use in the development of speech systems for low-resource languages. This summer, I am back in the lab, working on stage II of last year's project.
- Yocheved Levitan -
I am privileged to work under the guidance of a wonderful mentor, Dr. Julia Hirschberg the Chair of Columbia's Computer Science department. Dr. Hirschberg is a renowned researcher in the field of computational linguistics. She conducts research in TTS, HCI, NLP, type of speech analysis, and speech summarization. Aside from fostering a collaborative work environment and managing the speech lab as a whole, she is concerned with each individual's progress and career. I am indebted to her for her consistent support throughout my internship and beyond. Despite the fact that she is currently on sabbatical, she is completely involved in the lab and its research - coming in to work each day and meeting with us weekly. In addition to working under the leadership of Dr. Hirschberg, I am fortunate to be guided by Erica Cooper, a PhD student here at Columbia University. Her primary research interest is the construction of robust TTS systems for low-resource languages. She is a wealth of information, and is always available to answer any questions. The advice and skills that she has shared with me have helped me tremendously.
The project that we are working on this summer is applying data selection techniques to speech synthesis with a focus on low-resource languages. Languages such as English are considered high-resources languages because there are sufficient resources available to speakers of the language. Resources include automatic speech recognition (ASR), text-to-speech (TTS), and keyword search (KS) systems. There are other languages though that are deemed low-resource languages, because speakers of these languages do not have the aforementioned tools available to them. The root of the divide between the have and have-not languages is the difference in quantity of obtainable data. Telugu, for example is a widely-spoken language in India, but the lack of recorded audio and transcripts complicates the development of Telugu compatible speech systems. We hypothesize that TTS tools can be built with limited data by implementing data selection methods. By selecting a small base of quality data we can achieve respectable performance. The data subsets will be created according to features extracted from the corpus's audio clips. Initially, we will experiment with English corpora to evaluate our results. We hope to identify the key features that determine a successful synthesis. Success will be measured based on the naturalness and intelligibility of the voices that are synthesised with the selected data. Segments of the synthesized audio will be uploaded to Amazon Mechanical Turk where they will be rated objectively by a diverse population. Once we discover the defining features, we will use the same approach with LRL's to determine whether our findings are consistent cross-language.