What is Reading Chunker?

Reading Chunker uses natural language processing (NLP) via the Python Natural Language Toolkit (NLTK) to parse sentences into "chunks" and facilitate the cognitive processes that make reading more efficient. Research shows that native speakers tend to process reading in chunks such as noun phrases, verb phrases, prepositional phrases, clauses, etc. English learners who are proficient readers also make use of this bottom-up processing strategy when reading. However, less proficient English learners may not have the fluency to focus on chunking words into phrases yet; a supporting tool that chunks sentences for them can thus be helpful. This was the rationale for creating Reading Chunker.

Various ways of parsing sentences exist within the field of NLP, including more advanced machine-learning models that use statistics and probability to predict the most accurate noun or verb phrases. One is Google's cheekily-named Parsey McParseface. However, such tools are generally used for things such as parsing textual data to infer the sentiment of news articles, finding out what internet users are interested in so they can be targeted by ads, or displaying relevant search results. There seems to be little in the way of applying these NLP techniques to the field of language learning.

Reading Chunker operates using a modified "chunking grammar." It does not chunk sentences into traditional noun phrases, verb phrases, and prepositional phrases (as most chunkers in NLTK and Spacy). Instead, it aims to identify the most useful chunks for readers and English learners. These are generally, prepositional phrases, noun clauses, verb phrases (sometimes with an object), and perhaps adjective (relative) clauses. In fact, depending on the learner's level, a teacher may wish to adjust the chunking method to create longer or shorter chunks, for lower and higher-level English learners, respectively. A beginner learner may simply need clauses chunked into subject, verb, and object (or even subject-predicate). An advanced learner may wish to chunk only noun and verb phrases. Perhaps chunking phrasal verbs would also be helpful!

What's under the hood?

Reading Chunker operates using NLTK tokenization, followed by a regex pattern to chunk sentences into noun, verb, and prepositional phrases. Since prepositional phrases consist of a N + Prep, something this results in very large chunks. Thus, the user can adjust the chunking pattern. Teachers should use this adjustment to make chunks larger or smaller according to their students' level. Before chunking and if chosen as an option, the app first searches one or more lists of frequent formulaic expressions (n-grams) and/or a list of phrasal verbs. These are then chunked as well.

- Brendon Albertson, May 2021

albertsonbrendon@gmail.com