text mining

Data mining

Data mining can mean two different things. Firstly, pejorative references to data mining refer to the practice of ad hoc searches for statistically significant correlations in a data set that seem to support the researcher’s current views. This is often associated with a reluctance on the part of researchers to report (or publish) non-significant correlations, a practice which seems widespread according to research by John Ioannidis (author of “Why Most Published Research Findings Are False”). Secondly, the more neutral meaning of data mining refers to the systematic process of discovering patterns in data sets through the use of computer algorithms. An algorithm is a step-by-step procedure for calculations often involving multiple iterations but always having an end point where results become available. Computerisation allows for the application of complex algorithms to large data sets, enabling results to be generated very quickly at negligible cost.