Load iris dataset
import numpy as np
from portfolio.visualize import Plot
from sklearn.datasets import load_iris
iris = load_iris()
Get data for only type 1 and 2 for binary classifiers, and label as -1 or 1
x = iris.data[iris.target > 0]
y = np.where(iris.target[iris.target > 0] == 1, -1, 1)
This section contains code equivalent to that in the example run in the project documentation, to show that it meets the specifications.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from portfolio.perceptron import Perceptron, plot_decision_regions, IRIS_OPTIONS
from portfolio.visualize import Plot
Download and parse the dataset
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', **IRIS_OPTIONS)
Extract the first 100 labels (which are the first two types)
y = df.iloc[0:100, 4].values
Since we only selected the first two, the classes are either Iris-setosa
or Iris-versicolor
. Label Iris-setosa
as 1 and everything else as 0.
y = np.where(y == 'Iris-setosa', -1, 1)
The first and third features are seperable, so select only those for the values in y
X = df.iloc[0:100, [0, 2]].values
Plotting the data, we can clearly see that these two features are seperable.
plt.scatter(X[:50, 0], X[:50, 1], color='red', marker='o', label='setosa')
plt.scatter(X[50:100, 0], X[50:100, 1], color='blue', marker='x', label='versicolor')
plt.xlabel('petal length')
plt.ylabel('sepal length')
plt.legend(loc='upper left')
Initialize the perceptron with a learning rate of 0.1
and a maximum of 1000 iterations.
pn = Perceptron(0.1, 1000)
runs the perceptron algorithm on the given data. The number of errors per iteration is stored in errors
. Since this only took 6 iterations to converge, it dropped out early instead of doing the entire 1000.
pn.fit(X, y)
This data can be seen plotted below.
plt.plot(range(1, len(pn.errors) + 1), pn.errors, marker='o')
plt.ylabel('# of misclassifications')
Running it on the original data in order, we can see that it correctly classifies all of the samples.
Here is a visualization of the computed decision boundary
plot_decision_regions(X, y, pn)
The other data I tested against:
Really close data. 701 seems like a really big number, but they are close so I’m assuming it could be right.
X1 = df.iloc[0:100, [0, 1]].values
pn.fit(X1, y)
plot_decision_regions(X1, y, pn)
And data with a very wide separation. Here we can see it dropping out after only 3 iterations.
X2 = df.iloc[0:100, [2, 3]].values
pn.fit(X2, y)
plot_decision_regions(X2, y, pn)
Close all the figures we opened:
For single dimension data, linear regression is defined as $w = A-1b$, where A is
The linear regression code may be run on the iris dataset with the following:
from portfolio.linear_regression import LinearRegression
x_sepal_width = x[:, 1]
regression_plot = Plot(x_sepal_width, y)
Decision stumps are a type of weak learner. They consist of a decision tree with only a single node. My implementation only accepts a binary split of classes labeled -1 and 1, although in theory a decision stump can split based on threshold into any number of classes.
I couldn’t figure out how to translate the pseudocode for the efficient
Since the decision stump needs a threshold
The other part of the decision stump is the dimension of the data against which it will classify. This implementation considers all given dimensions and selects the most suitable one.
The last part is the error function. The error function I used is a count of the differences between the predicted class and the expected class.
For each of the combinations of dimensions and possible thresholds, the decision stump checks the error function. The one with the lowest error is selected as the dimension to classify against and the
Shown here is the decision boundary, along with the values plotted for the chosen dimension, with the label on the y axis. As you can see, these are not seperable, but the optimal boundary was still found.
from portfolio.decision_stumps import DecisionStumps
x_sepal_width = x[:, 1]
regression_plot = Plot(x, y)
SVMs are a linear classification model . Unlike the perceptron, SVMs optimize for the widest margin between classes.
In addition, since this is a soft-margin svm, it uses a a hinge loss function. This allows it to accept non linearly serperable data, and produce a reasonable classification boundary.
Optimization of this loss function is implemented via a stochastic gradient decent (SGD).
For example, this data from iris is not seperable.
from portfolio import svm
y = df.iloc[50:, 4].values
y = np.where(y == 'Iris-versicolor', -1, 1)
X = df.iloc[50:, [0, 3]].values
np.c_[y, X]
Running the soft svd classifier on it, we get a reasonably accurate classification of the data.
plot = Plot(X, y)
K nearest neighbors classifier simply classifies a sample based upon the closest training data.
At least in this implementation, although also in general to a lesser extent, KNN tends to be very computationally intensive for large datasets, since it must compute the distance to all points in order to find the best.
from portfolio.knn import KNN
y = df.iloc[50:, 4].values
y = np.where(y == 'Iris-versicolor', -1, 1)
X = df.iloc[50:, [0, 3]].values
np.c_[y, X]
knn = KNN(X, y)
knn.predict([[0, 0]]).item()
knn.predict([[3, 8]]).item()
Given the same data as the SVM example, we can see that KNN is able to classify this data entirely as opposed to the lossy linear classifier. This isn’t necessarily an argument, since it doesn’t take into account the possibility of overfitting, which KNN is always affected by.
Given certain datasets, including this one, the performance is very good.
fig = plt.figure()
visualize.plot_decision_regions(fig.add_subplot(), X, y, knn)