Machine Learning

Instructor: Dr. Aric LaBarr


Introduction

Machine learning can be a hard concept to define. The Merriam Webster dictionary defines machine learning in this way:

a computational method that is a sub-field of artificial intelligence and that enables a computer to learn to perform tasks by analyzing a large data set without being explicitly programmed.

The Oxford dictionary is similar in nature:

a type of artificial intelligence in which computers use huge amounts of data to learn how to do tasks rather than being programmed to do them.

Sounds complicated. However, if you were to look up lists of common machine learning algorithms, linear and logistic regression would be be there. We will take the approach that machine learning algorithms are exactly what the above definitions state, but additionally, focus more on the predictive nature of algorithms as compared to the interpretability. Although a tremendous amount of work has been done in recent years to help make these algorithms more interpretable, their original creation was done for the value of prediction and complexity as compared to interpretation. We will discuss some of the ways to interpret these algorithms at the end of this code deck. For more of an interpretable / statistical modeling approach to solving problems, feel free to look at the Logistic Regression code deck to learn more!

Machine learning algorithms can either be supervised or unsupervised in nature. In supervised learning, variables are used to predict or explain a known target variable (a binary target variable is shown in the example below).

Supervised Classification Modeling

Unsupervised learning models don’t have a target variable to try and relate predictor variables to.

Unsupervised Modeling

This code deck will only focus on supervised machine learning algorithms. The reason we cover all of the algorithms in this code deck is that we are unsure ahead of time which algorithm will work best for any specific data set. Stephen Lee from the University of Idaho has a great visual to summarize this idea:

Comparing Algorithms (Lee, University of Idaho)

This markdown file contains information on how to perform many different machine learning algorithms and all of their components in R and Python. In each section you are able to toggle back and forth between the code and output you desire to view.

The libraries for R and Python will be loaded as we go throughout the code.