We live in a world where we are constantly being bombarded with information. Not only do we consume insane amounts of data, we are also providing other people and businesses with information about ourselves. Signing up for online mailing lists, ordering magazine subscriptions, and even making dinner reservations, information about our habits and preferences is constantly being left behind, a concept that Charlie Berger refers to as data exhaust in a podcast from October 10, 2017 on Software Engineering Radio. The larger concept that he is describing is what is known as ‘datafication’, a buzz-word in the data science and big data spheres that refers to the collecting and storing information about social actions that can be used to perform predictive analyses and targeted marketing.
Specific to the computer science discipline, datafication has implications on the development of predictive applications. In the podcast episode, Berger presents the simple yet extremely effective example of an ATM machine as lacking in the predictive application sense. Berger wonders why each time that he uses the ATM he is asked which language he would like to use, and why such preferences are not somehow tracked and stored, making for a more seamless and personalized ATM experience. Berger even suggests that the ATM track more than language preferences, offering withdrawal suggestions based on previous transaction data from a similar day of the week or time of the day.
While it may not be terribly inconvenient to have to choose a language each time you use the ATM, the concept of predictive applications and the advantages associated with creating and using these types of applications becomes much more apparent when considering larger-scale operations. Retailers can use predictive applications to make important decisions about things like advertising and merchandising. Berger mentions the well-known “parable of the beer and diapers,” where an interesting and entirely unexpected correlation was found between purchases of diapers and beer. While some versions of the tale include the retailer moving the two correlated items next to one another in order to drive increases in sales, this may or may not be factual. Regardless, such examples of generating useful information based on querying data is a perfect example of the power the predictive applications have.
Berger repeatedly stresses the importance of moving the algorithm to the data, not vice-versa. By moving the algorithm to the data, we avoid all of the dangers of bypassing security and encryption. Developing applications that perform queries and compile information that is usable and useful to not only data scientists, but normal people as well, is a perfect example of how machine learning and predictive applications can make everyones jobs easier.
As a student, I took one of Berger’s closing remarks under careful consideration. Berger states that it is much easier for a programmer to learn how to make a program that interprets data than for a data scientist to translate his specific, one-off analyses into programs. With a newfound understanding of why predictive applications are so important to our data-obsessed society, I look forward to exploring how I can begin developing applications that take advantage of machine learning.