Math for Data Science

Summary

Math for Data Science covers the portions of linear algebra, calculus, probability, and statistics prerequisite to Data Science, and applies these topics to two central problems in data science: principal component analysis (PCA) and neural network training. While PCA primarily relies on linear algebra, neural network training combines multiple mathematical tools.

The highlight of the book is the machine learning chapter, where the results of the previous chapters are applied to neural network training and stochastic gradient descent. Also included in this last chapter are advanced topics such as accelerated gradient descent and logistic regression trainability.

Nine appendices covering background material and 392 exercises are included. Examples are supported by Python code, with Jupyter notebooks and CSV files available here. Errata in the book are listed here.

History

A neural network is a function defined by parameters, or weights. Given a large dataset inserted into the network, the goal is to train the network: to adjust the weights so the resulting network outputs closely match the dataset targets. This is achieved by using gradient descent to navigate the error landscape in weight space, thereby minimizing the error between outputs and targets.

Historically, training neural networks at scale was impractical due to the large number of weights involved. A breakthrough came with stochastic gradient descent (SGD), first introduced in the 1950s and widely applied to neural networks in the 1980s. SGD enables convergence to a minimum error by following approximations of the true gradient, even when those approximations are noisy.

While computing the full gradient requires summing over many terms, SGD estimates the gradient using small subsets of the dataset, known as minibatches. This reduces computational demands while maintaining convergence, albeit at the cost of longer training times. Despite this trade-off, SGD has made large-scale neural network training feasible, paving the way for deep learning and AI.