How We Read: Text Skimming And Reading Aids

I began working on my Ph.D. in computer science at Cambridge in 2003. The Cambridge Ph.D. is a research-only course and so I was through with taking classes. My task for the first year was to read as much background literature as I could in the area of computational linguistics and also to state a research problem that I would undergo to solve in the next few years.

My research goal was to simulate the text skimming process of a skilled and sighted reader by a machine with an accessible interface allowing any reader the benefits of skimming. Text skimming was something that just came naturally to many people and there was not a lot written about exactly what humans do when they skim a text. In addition to the shortage of literature on the subject, I had never been able to skim text for myself because of my vision so I lacked that intuition of the process of text skimming. However, I had an intuition about what I needed an intelligent reading machine to be able to do. I started reading about the linguistics behind reading and reading strategies with my own intuitions in mind.

Reading strategies

There are a few different reading strategies that are commonly used by skilled sighted readers. I was most interested in these four:

1. Text scanning is the simplest of the reading strategies and is the process of searching a text for a particular word or phrase. When scanning, a reader looks at the words on each page (generally in linear order) but does not internalize them all. The reader is primarily looking for letter patterns that match the desired words. This reading strategy is fast but does not give the reader information about the content of the text other than whether or not the search word was found.

2. Text skimming is the process of looking at each page for any content phrases. With this strategy, sentences are not processed but words are read. The reader notes potentially relevant phrases in the text along with where they occur. A reader who is good at skimming will also notice what phrases are used together. Text skimming gives the user a way to build a conceptual structure for a text. Since sentences are not actually read and the reader is not searching to find or eliminate the presence of a word, pages and passages can be skimmed nonlinearly without hurting the conceptual structure being formed. Although skimming takes longer than scanning, it is still a fast process.

3. Casual reading is a strategy of reading many or most of the sentences while skimming the rest. If a sentence is read, and the syntax and semantics are understood, then the user gains more detailed information than the conceptual overview that is produced by skimming. Casual reading takes longer than skimming but can still be done quickly, depending on how many sentences are read and how many are skimmed. Although more detailed information can be obtained by casual reading than by skimming, the reader’s comprehension of the text may only be 60%-80%.

4. Careful reading is the strategy of reading each sentence for understanding. The goal of careful reading is to have a high comprehension of an entire passage. When reading a long text in this way, careful reading takes the most time and requires the most concentration of these strategies but can be used in conjunction with the others.

Text skimming gives a skilled reader a compromise between searching (scanning) and careful reading. As you are skipping over text, you can slow down when you come across themes that are relevant or interesting. When you find what may be a useful section, you may read it rather that skimming it. If you find a relevant theme and you want to find other portions of the text that discuss that theme, then you can scan for it.

Reading aids

In addition to the various reading strategies, readers often enjoy the benefits of search and navigation help from authors and publishers of books. Authors can help readers by providing a detailed table of contents and a back-of-book index. These two utilities are provided explicitly as part of a book and can be used regardless of a user’s proficiency at the reading strategies mentioned above. I saw this as a starting point for how to build text skimming into a reading machine.

The table of contents is a navigation tool that divides a text conceptually and provides links to the beginning of each division (or chapter). The usefulness of the table of contents depends on the diligence of the author but even a basic chapter list provides more navigation help than simply navigating using the page numbers. Adding section headings to divide each chapter provides a hierarchical navigation tool that can enable more effective skimming.

A drawback of the table of contents is that the reader must rely on the assumption that the author chose useful divisions for chapters and that each chapter title is descriptive of the chapter’s content. As well-intentioned as the author may be, the author is biased in creating an outline of a book; the author will emphasize the topics that he/she sees as most important in the text and will break the chapters and sections accordingly. However, what the author sees as most important and what is actually written in the text may not line up perfectly. When you skim a text, you are constructing your own conceptual organization of what the text really says.

The back-of-book index provides a means of searching the text of a book with a keyword search. Each keyword in the index has a link to each sentence in the text containing the word. The reader uses the technique of scanning to find keywords of interest. Another feature of the back-of-book index is that it is browsable. That is, a reader can make use of the index without having a specific query. The keywords used in the index typically correspond to important terms in the text.

When the table of contents and the back-of-book index are used together, they can be quite effective. The terms in the index make up the content of the text describing the major themes listed in the table of contents. The biggest shortcoming of these two approaches is that they are only available for books. The second biggest drawback is that they both rely on work done by the author or the publisher. If you want to read a long chapter from a book, or some other long document like a report, you may wish to benefit from the ability to navigate the text by theme or browse the text by vocabulary terms.

After two years of researching the linguistics of reading and after conducting my own experiments to see how humans organize text into topics, I began designing the first prototype for what is now Skimcast. Since text skimming enables a reader to construct a conceptual overview of any text, my task was to develop a technology that would generate such an overview automatically and give users a way to navigate any text by concept using this structure. What I needed next was a way to teach a machine how to figure out enough about the meanings of words to discover concept themes in text.

IMAGE: Table of contents from The Doctrine Of Fluxions by W. Emerson, printed in 1757. Scan courtesy of Google Books.

