Mining of Massive Datasets

This book, written by Anand Rajaraman and Jeffrey David Ullman, is based on the Stanford University course Mining Massive Datasets. It focuses on data mining of very large amounts of data with a examples generally about data extracted from the web. 

The book is free to download from here: Mining of Massive Datasets

A hard copy can be purchased from here: The Mining of Massive Datasets book

An online course is available here: The MOOC (Massive Open Online Course)

Excerpt

"Statisticians were the first to use the term “data mining.” Originally, “data mining” or “data dredging” was a derogatory term referring to attempts to extract information that was not supported by the data. Section 1.2 illustrates the sort of errors one can make by trying to extract what really isn’t in the data. Today, “data mining” has taken on a positive meaning. Now, statisticians view data mining as the construction of a statistical model, that is, an underlying distribution from which the visible data is drawn."

Contents

  • Data Mining
  • MapReduce and the New Software Stack 19
  • Finding Similar Items 71
  • Mining Data Streams 129
  • Link Analysis 161
  • Frequent Itemsets 199
  • Clustering 239
  • Advertising on the Web 279
  • Recommendation Systems 305
  • Mining Social-Network Graphs 341
  • Dimensionality Reduction 403
  • Large-Scale Machine Learning 437

Sources

Rajaraman, A. and Ullman, J. D., (2014). Mining of Massive Datasets, Cambridge University Press, New York. Retrieved from: http://www.mmds.org/

Related links