Let's wrap up this Deep Learning by taking a a quick look at the effectiveness of Neural Nets!
We'll use the Bank Authentication Data Set from the UCI repository.
The data consists of 5 columns:
Where class indicates whether or not a Bank Note was authentic.
This sort of task is perfectly suited for Neural Networks and Deep Learning! Just follow the instructions below to get started!
Use pandas to read in the bank_note_data.csv file
import numpy as np
import pandas as pd
df = pd.read_csv('bank_note_data.csv')
Check the head of the Data
df.head()
We'll just do a few quick plots of the data.
Import seaborn and set matplolib inline for viewing
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
%matplotlib inline
Create a Countplot of the Classes (Authentic 1 vs Fake 0)
sns.countplot(x = 'Class', data = df, palette = 'viridis')
Create a PairPlot of the Data with Seaborn, set Hue to Class
sns.pairplot(data = df, hue = 'Class', palette = 'viridis')
from sklearn.preprocessing import StandardScaler
Create a StandardScaler() object called scaler.
sc = StandardScaler()
Fit scaler to the features.
sc.fit(X = df.drop('Class', axis = 1))
Use the .transform() method to transform the features to a scaled version.
scaled_data = sc.transform(X = df.drop('Class', axis = 1))
Convert the scaled features to a dataframe and check the head of this dataframe to make sure the scaling worked.
df_scaled = pd.DataFrame(data = scaled_data, columns = df.columns[:-1] )
df_scaled.head()
Create two objects X and y which are the scaled feature values and labels respectively.
X = df_scaled
y = df['Class']
Use the .as_matrix() method on X and Y and reset them equal to this result. We need to do this in order for TensorFlow to accept the data in Numpy array form instead of a pandas series.
Note from Serhan Mete 14/7/2018 : .as_martix() is now obselete and should be replaced w/ .values
X = X.values
y = y.values
Use SciKit Learn to create training and testing sets of the data as we've done in previous lectures:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 101)
y_test = (y_test==1).astype(np.int32)
y_train = (y_train==1).astype(np.int32)
Note from Serhan Mete 14/7/2018 : Contrib.learn is now ~depricated. Instead of directly interacting w/ Tensorflow, we'll use Keras instead which provides a high level API that makes life quite a bit easier.
We're going to build a sequential model w/ three hidden layers w/ [10,20,10] structure, binary output class, and train it with 30 epochs w/ a batch size of 20.
from keras.models import Sequential
from keras.layers import Dense
# A sequential model where we stack layers on top of each other
model = Sequential()
# Stack 3 hidden layers and 1 output layer
model.add(Dense(units = 10, activation='relu', input_dim=4))
model.add(Dense(units = 20, activation='relu'))
model.add(Dense(units = 10, activation='relu'))
model.add(Dense(units = 1, activation='sigmoid'))
# Now compile the method.
model.compile(loss='binary_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Now fit the model
model.fit(X_train, y_train, epochs = 30, batch_size = 20)
Use the predict method from the classifier model to create predictions from X_test
Now create a classification report and a Confusion Matrix. Does anything stand out to you?
y_predict = model.predict_classes(X_test)
from sklearn.metrics import confusion_matrix, classification_report
print(confusion_matrix(y_test,y_predict))
print(classification_report(y_test,y_predict))
# Let's also look at the scores - obvious but still...
score = model.evaluate(X_test, y_test, verbose=0)
print('Test score: %.3f'%(score[0]))
print('Test accuracy: %.3f'%(score[1]))
You should have noticed extremely accurate results from the DNN model. Let's compare this to a Random Forest Classifier for a reality check!
Use SciKit Learn to Create a Random Forest Classifier and compare the confusion matrix and classification report to the DNN model
from sklearn.ensemble import RandomForestClassifier
rc = RandomForestClassifier()
rc.fit(X_train, y_train)
y_predict = rc.predict(X_test)
print(confusion_matrix(y_test,y_predict))
print(classification_report(y_test,y_predict))
It should have also done very well, but not quite as good as the DNN model. Hopefully you have seen the power of DNN!