How Big Data can help us learn about the words we use and the world we live in
Did you know that the majority of folks in Midland America use “uh” over “um”? That “gin and juice” has been in sharp decline since 1994? Or what about the fact that the American Dream didn’t exist until the late 1920s?
That’s not to say that Americans haven’t dreamed of bigger and better prior to 1930, only that the phrase itself started gaining in popularity around that time. Not to get too much into the semiotic weeds, but the phrase “American Dream” is what’s called a signifier; a word or phrase, and nothing more. The other side of that equation, the signified, is the Rags-to-Riches story that so often comes to mind when we read or hear the phrase. It’s an interesting correlation that, as the country was steadily declining into what would become the Great Depression, the wish for riches began to manifest itself in our common language. The question then is, what does any of this have to do with Big Data?
The short answer is: everything. Linguists have long studied trends in language but have found their data sets too small and incomplete to ever get a clear picture of the path of certain words or phrases. Now, thanks to sites like Google Books Ngram project, Rap Genius, and researchers like Jack Grieve (of the “uhs” and “ums”) we have the ability to not only access those massive amounts of data, but to begin searching across the breadth of them as well. And when we say massive, we’re talking Google’s repository of over 4 million books spanning a 200 year period, thousands upon thousands of crowdsourced rap lyrics, and some 6 billion words searched for all those “uhs” and “ums.”
So why does any of this matter? It matters because it’s through the study of language and trends within language that we gain insights into much more than just words. We’re able to see the stories behind the trends, be they as big as the advent of the American Dream, or as little as the decline of a popular topic in rap music. We’re able to see all the things that are most important to a culture and for how long they stayed important. We’re able to see that the American Dream is as relevant now as it was in the 1930s, while our interest in Snoop Dogg’s drink of choice has waned a bit.
The big question is what do we do with not only the information about the phrases that are trending, but the Zeitgeist associated with those phrases? The short answer is: a lot. Take a recently familiar term, binge watching. While the idea of marathon watching a season’s worth of TV shows has been around since the release of VHS boxsets, it’s only recently that the term binge watching has come into the mainstream, thanks in large part to Netflix. While the company released all of Lilyhammer in early 2012, it was the first season of House of Cards in 2013 that brought both the term and associated action to the forefront of a lot of people’s minds.
The ability to recognize the trend of both the phrase and the action of binge watching has proven valuable to some businesses. Over the past two years Comcast has been allowing viewers unlimited access to entire seasons of shows during their annual Watchathon, an “epic week of binge viewing.” The week coincides with a lot of shows that are either ending their current seasons or on the verge of a season premiere. What they’ve found is that shows featured in the promotion saw as much as a 69% increase in the live viewing numbers for the premiere.
Their success comes as a result of a handful of vital elements. The first is the identification of a trending phrase; that’s the signifier, the collection of letters and words. The second key piece is to understand what emotional concept exists in tandem with the phrase; the pleasure that comes from being able to catch up on an entire show’s history or burning through a just released series in a weekend.
Companies such as Comcast that are able to bring those two pieces of the puzzle together are able to not only increase its viewership for live premieres, but ensure that its customers are happy and satisfied.
Photo Credit: Matheus Almeida