Cynthia And The Beginnings Of An Intelligent Reading Machine

Cynthia And The Beginnings Of An Intelligent Reading Machine

During my first year as a graduate student in computer science at UGA, I experimented with different software and hardware for reading text. While using various products, I made notes of their limitations and how far they were from solving my reading problems.

Over the Summer of 2000, I wrote an outline of features for what I called the intelligent reading machine for the blind and sight-impaired. This plan guided my next year of work with the linguistics courses that I chose to take, the way that I studied for them, and, of course, my work on Cynthia, my speech synthesizer. Central to my intelligent reading machine were the ability to listen to text read aloud from any source and the ability to navigate the audio based on content and not just skipping a certain number of words or seconds.

As mentioned earlier, I listened to many of my undergraduate textbooks read on tape. The tape player provided by Recording For The Blind And Dyslexic had a knob to control the playback speed. I had trained my ear so that I could listen to and understand a person reading at a fast rate for a long period of time, often between an hour and a half and 2 hours without a break. After an hour and a half, I would be tired but I could still pay attention. This was not possible for me when I first started listening to my computer read to me. After about 30 minutes, my mind would wander and it was very difficult to stay focussed. As I later learned, the intelligibility of synthesized speech was known to degrade faster over time than human speech.

The intelligibility of speech generated by a machine also degrades faster than natural speech as the speech becomes faster. This adds another disadvantage that a listener would have compared to a reader. The average reader reads between 200 and 240 words silently but reads between 140 and 160 words per minute aloud. The average listener can concentrate on text being read at a faster rate than that when the text is for a general audience but listening comprehension generally drops for technical material, requiring a slower speaking rate. For general text, a listener may be able to speed up an audio tape without sacrificing understanding by much but the same person may struggle to understand text read faster than 150 words per minute by a computer.

My belief was that improving the quality and naturalness of the machine’s speech was the most important problem to solve in creating my intelligent reading machine. I was optimistic that the features to navigate based on content that I had in mind would not take as long as a high quality speech synthesizer to develop and that it would be secondary to the need for the listener to listen longer and faster. Thus, I devoted my energy and spare time to Cynthia.

I wrote the first lines of code for Cynthia on Friday night, January 26, 2001. She spoke her first words by the end of the weekend. This initial test version split an input sentence into individual words and looked for each word in a dictionary of pronunciations (the CMU Pronouncing Dictionary). If a word was not in the dictionary then Cynthia would sound the word out based on the spelling. Over the next month leading up to Cynthia’s public debut at the Athens Linux Fest, an annual computer show held on campus, I would develop a more robust algorithm for sounding out an unknown word much like a human would by using information about syllable structure, prefixes, suffixes, etc.

Pronunciations were divided into individual speech sounds (phonemes). For example, the pronunciation of the word “computer” as transcribed in the CMU dictionary is /K AH M P Y UW T ER/. The pronunciation for each word was sent to a program called MBROLA which searched a database of thousands of speech sounds taken from recordings of an actual person to return the best recording to match the requested sound. All of the MBROLA recordings were combined into one audio file containing the speech.

The first version of Cynthia only had monotone speech as I was concentrating on pronunciation, stress, and rhythm. The next step was to add intonation (the rise and fall of pitch) to all parts of Cynthia’s speech. I had also become interested in what makes human speech sound natural and why we are naturally able to hear the difference between natural speech and speech generated by a computer.

Cynthia started getting some publicity around campus and in the local newspaper. Over the next year, I was invited to give several lectures around the UGA campus about Cynthia and speech synthesis. My project was mentioned on the MBROLA website (under my name) in their list of speech systems that used any of their tools.

In February of 2001, one of my linguistics professors urged me to go to the Linguistic Society of America’s (LSA) Summer Institute of Linguistics that was being held that Summer at the University of California at Santa Barbara. This was a six week Summer school that the LSA hosted every other Summer. It moved from university to university and they invited prominent leaders in diverse areas of linguistics to teach classes on their subjects. Graduate students and industry researchers and even some faculty came from around the world to participate as students in this program.

I arrived at the institute in early June with the goal of learning as much as I could about aspects of linguistics that I would need for the Cynthia project so I enrolled in classes on Phonetics, Phonology, Intonation, and Cognitive Linguistics. There were a few hundred students and we stayed in dorms on campus and were able to spend most of the time every day thinking and talking about linguistics.

That was an important Summer because there were so many leading experts around who were willing, and even eager, to talk to graduate students and answer our many questions. Each day was a new opportunity to figure out something. I returned to UGA in the Fall more excited about my intelligent reading machine project than ever. I had found answers to many of the questions that I had going into the Summer but those six weeks left me with many more new questions and ideas about how a machine could be taught to understand what it reads.

Leave a Reply

Your email address will not be published. Required fields are marked *

Download Skimcast!