Introduction to Data Analytics
Paul Dixon
This text is a practical introduction to data exploration and modelling. Chapters are sequenced according to the CRISP-DM framework for data mining: business understanding, data understanding, data preparation, modelling, evaluation and deployment. Significant attention is given to exploring messy data. This begins by formatting a data file to improve readability and identifying gaps or anomalies. Excel pivot tables and charts are used to identify patterns and obtain insights into modelling. The model building process is explored in some depth using multiple regression from a data science, rather than a classical statistics perspective. Several classification models (logistic regression, decision trees, naïve Bayes, and K nearest neighbour) are introduced to demonstrate the diversity of approaches to modelling. Probability is used to address evaluation and risks associated with applying classification models. The final chapter addresses issues of implementation and communication. The text assumes no mathematical background beyond simple functions. Although inferential statistics is not addressed, variability in data and in estimates is emphasized. Standard error of estimates is given significant attention and the concepts of confidence intervals and hypothesis tests are implicit in discussions of model estimation, but without introducing statistical jargon. Students are not expected to have knowledge of Excel. The text contains videos demonstrating each Excel activity that students will use. Assignment and lab exercises using Excel with large data sets are available from the author.

Top Hat Interactive eText
requires a join code from instructor
$40.00
all prices represent net price, not including 3rd party markups

Table of Contents for Introduction to Data Analytics
- Textbook
- Data Files