Linguists Tackle Computational Analysis of Grammar

A visualization of how a given word may be used.

The University of Chicagos Research Computing Center is helping linguists visualize the grammar of a given word in bodies of language containing millions or billions of words.

Credit: Ricardo Aguilera/Research Computing Center

University of Chicago (UC) researchers are studying natural language morphology in an attempt to develop computers that are better at understanding human language.

The researchers are using the Research Computing Center's (RCC) Midway supercomputing cluster to analyze corpa, which are standard bodies of written language that can contain billions of words taken from many different genres of writing. "A typical scenario for us is that, given some raw data, we have some intuition about certain patterns in the data, and we collaborate with RCC to create visualization tools to display data in a way that enables us to explore these patterns," says UC researcher Jackson Lee.

The visualization shows what words occur most often before and after it in a natural language corpus. "The construction of this visualization tool grew out of the observation that overall word distribution patterns are sensitive to the specific distribution of individual words, and we need a tool to 'see' what the grammar of a given word really looks like," Lee says.

He notes a better understanding of natural language morphology can lead to better designed human-machine interfaces and a better way to search large databases.

