Mining of Massive Datasets

2014

This book, written by Anand Rajaraman and Jeffrey David Ullman, is based on the Stanford University course Mining Massive Datasets. It focuses on data mining of very large amounts of data with a examples generally about data extracted from the web.

The book is free to download from here: Mining of Massive Datasets

A hard copy can be purchased from here: The Mining of Massive Datasets book

An online course is available here: The MOOC (Massive Open Online Course)

Excerpt

"Statisticians were the ﬁrst to use the term “data mining.” Originally, “data mining” or “data dredging” was a derogatory term referring to attempts to extract information that was not supported by the data. Section 1.2 illustrates the sort of errors one can make by trying to extract what really isn’t in the data. Today, “data mining” has taken on a positive meaning. Now, statisticians view data mining as the construction of a statistical model, that is, an underlying distribution from which the visible data is drawn."

Sources

Rajaraman, A. and Ullman, J. D., (2014). Mining of Massive Datasets, Cambridge University Press, New York. Retrieved from: http://www.mmds.org/

Excerpt

Contents

Sources

Related links