Location
CC 229
Start Date
31-7-2014 8:30 AM
End Date
31-7-2014 9:30 AM
Description
As part of the Google Books digitization project, the Google Books Ngram Viewer (https://books.google.com/ngrams) enables public querying of a “shadow dataset” created from the millions of digitized books. Ngram is derived from a computer science term referring to strings of alphanumeric terms in particular order. The Ngram Viewer provides insights on diverse topics such as the phenomena of fame, collective forgetting, language usage, technological innovations, and more. Data queries using this tool are virtually otherwise unanswerable, which is also used for witty data visualizations (such as simultaneous queries of “chicken” and “egg” to see which came first) based on the resulting plotted line chart. Download of raw dataset information of the respective ngrams also are available and the findings released under the intellectual property policy. This presentation introduces this semi-controversial tool and some creative applications for research and learning.
Exploring the Google books Ngram viewer for 'Big Data' text corpus visualizations
CC 229
As part of the Google Books digitization project, the Google Books Ngram Viewer (https://books.google.com/ngrams) enables public querying of a “shadow dataset” created from the millions of digitized books. Ngram is derived from a computer science term referring to strings of alphanumeric terms in particular order. The Ngram Viewer provides insights on diverse topics such as the phenomena of fame, collective forgetting, language usage, technological innovations, and more. Data queries using this tool are virtually otherwise unanswerable, which is also used for witty data visualizations (such as simultaneous queries of “chicken” and “egg” to see which came first) based on the resulting plotted line chart. Download of raw dataset information of the respective ngrams also are available and the findings released under the intellectual property policy. This presentation introduces this semi-controversial tool and some creative applications for research and learning.