Chapman and Hall/CRC, 2010. — 305 p. — ISBN 1439810184, 9781439810187.
The versatile capabilities and large set of add-on packages make R an excellent alternative to many existing and often expensive data mining tools. Exploring this area from the perspective of a practitioner, Data Mining with R: Learning with Case Studies uses practical examples to illustrate the power of R and data mining.
Assuming no prior knowledge of R or data mining/statistical techniques, the book covers a diverse set of problems that pose different challenges in terms of size, type of data, goals of analysis, and analytical tools. To present the main data mining processes and techniques, the author takes a hands-on approach that utilizes a series of detailed, real-world case studies:
Predicting algae blooms
Predicting stock market returns
Detecting fraudulent transactions
Classifying microarray samples
With these case studies, the author supplies all necessary steps, code, and data.
Web Resource
A supporting website mirrors the do-it-yourself approach of the text. It offers a collection of freely available R source files that encompass all the code used in the case studies. The site also provides the data sets from the case studies as well as an R package of several functions.
How to Read This Book
A Short Introduction to R
A Short Introduction to MySQL
Predicting Algae BloomsProblem Description and Objectives
Data Description
Loading the Data into R
Data Visualization and Summarization
Unknown Values
Obtaining Prediction Models
Model Evaluation and Selection
Predictions for the 7 Algae
Predicting Stock Market ReturnsProblem Description and Objectives
The Available Data
Defining the Prediction Tasks
The Prediction Models
From Predictions into Actions
Model Evaluation and Selection
The Trading System
Detecting Fraudulent TransactionsProblem Description and Objectives
The Available Data
Defining the Data Mining Tasks
Obtaining Outlier Rankings
Classifying Microarray SamplesProblem Description and Objectives
The Available Data
Gene (Feature) Selection
Predicting Cytogenetic Abnormalities
Index of Data Mining Topics
Index of R Functions