Google’s library of over five million digitized books is a well-known public resource. A little-known feature of that resource is the accompanying, searchable database, which allows users to search words or phrases and track their frequency over time. The Google Ngram Viewer accesses books published as early as the sixteenth century, and as recently as 2010. Using the database has provided sociologists and students of cultural trends a unique insight into the births and lifespans of trends and cultural norms. It also sheds light on the rise and fall of public figures and well-known places in terms of cultural relevance.
For example, in 1636, Boston was mentioned in published works one hundred times more frequently than New York, and over a thousand times more frequently than Philadelphia. Less than thirty years later, in 1666, authors mentioned Philadelphia ten times more frequently than either of the other two cities. Historians can use this information to trace the formation of political ideas, the development of colonial unrest, and the development of economic and cultural centers. From the American Revolution through the Civil War, the three cities were noted with almost equal frequency. Following the Civil War, coinciding with the Industrial Revolution, New York quickly emerged as the most referenced of the three cities, indicating the reconfiguration of economic power in a new economy. Because perspectives of what are considered well-known historical events can be skewed by popular culture, information such as that provided by Ngram Viewer is exceptionally valuable.
In a 2012 paper published in the academic journal Science, Harvard University researchers Jean-Baptiste Michel and Erez Lieberman Aidan labeled the use of data from Ngram Viewer for analysis “culturomics.” They chose the term to reflect the use of trends to develop conclusions in research. Furthermore, Michel and Aidan proposed the use of Ngram Viewer as a new way to mine evidence for research in the humanities, and suggested that the availability of information could lead to new fields within the humanities.
By graphing what people found important enough to write about over the centuries, researchers can make connections between new technologies or ideas and how long it took for them to become adopted. For example, the term radio first appears in print around 1900, probably in reference to wireless Marconi radios. The popularity of the term spikes dramatically until 1944, around the time television was introduced, then begins to drop off. However, it is not until 1982 that the word television appears in print more frequently than radio. These trends indicate that while the popularity of radio declined as that of television increased, the change in cultural interest was anything but sudden. This same trend is evident in comparing the words horse and car. While horses peaked in mentions in print in 1860, the decline was so gradual that it wasn’t until 1950 that cars were mentioned more frequently. Does this data prove that more people owned horses than did cars up until 1950? It isn’t likely. It does, however, demonstrate that writers and the media maintained a steady interest in horse travel even as the popularity of automobiles skyrocketed in the post-WWII era.
Speaking of wars, WWI was popularly referred to as The Great War until World War II made it necessary to distinguish between two large-scale conflicts. Predictably, use of the term The Great War peaked in 1921, just a short time after the war ended. However, popular usage of the term WWI did not appear more frequently in print until 1976. For over fifty years after the end of WWI, and through WWII, the Korean War, and the Vietnam War, writers demonstrated a preference for the older term.
Researchers in the humanities can point to several reasons for this slow adoption of the newer term. First, the term The Great War is certainly more dramatic, and writers of war histories likely prefer it for its ironic appeal. Second, for some time following the war, most of the writers may have lived through the war and used the term that was popular in their own time. Finally, WWI has been referred to as the most literary war of the modern era. Writers and poets such as Ernest Hemingway and Alan Seeger gained prominence because of their experiences. Their works are referenced in many succeeding works, which could account for the preservation of the term.
There are also some surprising trends. For example, “immigration” was a popular topic in the early 1600s, but that interest dropped off dramatically to hardly a mention by the end of the century, and the term would not be as popular again until 1904. It was not the case, of course that there was no immigration over those 300 years. In fact, the term “emigration” increased in popularity from the early 1600s until the mid-1800s. This data demonstrates the point of view of the majority of writers in that time period. Until the period following the American Civil War, most of the writers considering the subject of migration were likely writing from the point of view of Europeans, rather than Americans.
There is some evidence that applying Google’s Ngram Viewer’s technology to more finite text sets could have value as a predictive tool. For example, data mining large quantities of digital text has shown promise in predicting political events. Researchers tracking media reports were able to accurately predict the 2011 Arab Spring, based on the frequency of specific terms in local media that had previously appeared in other areas where political unrest had recently emerged.
Despite the interesting data that can be plotted based on one or two words at a time, some in academia are less enamored of the technology. In a recent paper, one researcher pointed out the limitations that could lead to skewed data. “…Many voices - already lost to time - lie forever beyond our reach.” This comment, of course, refers to books that have not been digitized, newspapers that were never archived, and volumes of private or professional writing that were never preserved. One has to consider what criteria were applied when every text was preserved, and again, when each was committed to its second, digital life. Ngram Viewer is biased, in fact, by decisions made decades or even centuries ago regarding which writing was worthy of the ages.
The author uses the examples of radio and television and horse and car to illustrate what important fact about new ideas or technologies?