Self-Organizing Map or self-organizing feature map (SOFM) algorithm can be use to create a model of typical cardholder's behavior and to analyze the deviation of transactions, thus finding suspicious transactions. This a type of artificial neural network (ANN) that is trained using unsupervised learning.
Problem:
Dataset:
Source: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients
An illustration of the training of a self-organizing map. The blue blob is the distribution of the training data, and the small white disc is the current training datum drawn from that distribution. At first (left) the SOM nodes are arbitrarily positioned in the data space. The node (highlighted in yellow) which is nearest to the training datum is selected. It is moved towards the training datum, as (to a lesser extent) are its neighbors on the grid. After many iterations the grid tends to approximate the data distribution (right).
Like most artificial neural networks, SOMs operate in two modes: training and mapping. "Training" builds the map using input examples (a competitive process, also called vector quantization), while "mapping" automatically classifies a new input vector.
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Credit_Card_Applications.csv')
# Check dataset
dataset.head()
# Label the array values of independent and dependent variables.
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
# Check dimension
X.shape, y.shape
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
X = sc.fit_transform(X)
# Training the SOM
from minisom import MiniSom
som = MiniSom(x = 10, y = 10, input_len = 15, sigma = 1.0, learning_rate = 0.5) #Sigma = radius
som.random_weights_init(X)
som.train_random(data = X, num_iteration = 100)
# Visualizing the results
from pylab import bone, pcolor, colorbar, plot, show
bone()
pcolor(som.distance_map().T)
colorbar()
markers = ['o', 's']
colors = ['r', 'g']
for i, x in enumerate(X):
w = som.winner(x)
plot(w[0] + 0.5,
w[1] + 0.5,
markers[y[i]],
markeredgecolor = colors[y[i]],
markerfacecolor = 'None',
markersize = 10,
markeredgewidth = 2)
show()
# Finding the frauds
mappings = som.win_map(X)
frauds = np.concatenate((mappings[(7,1)], mappings[(9,1)]), axis = 0)
frauds = sc.inverse_transform(frauds)
# Creating the matrix features
customers = dataset.iloc[:, 1:].values
# Creating the dependable variable
is_fraud = np.zeros(len(dataset))
for i in range(len(dataset)):
if dataset.iloc[i,0] in frauds:
is_frauds[i] = 1
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
customers = sc.fit_transform(customers)
# Check dimension
customers.shape, is_fraud.shape
# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense
# Initialising the ANN
classifier = Sequential()
# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 2, kernel_initializer = 'uniform', activation = 'relu', input_dim = 15))
# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))
# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])
# Fitting the ANN to the Training set
classifier.fit(customers, is_fraud, batch_size = 1, epochs = 2)
# Predicting the Test set results
y_pred = classifier.predict(customers)
# concatenate customerID with the probability values
y_pred = np.concatenate((dataset.iloc[:, 0:1].values, y_pred), axis = 1)
y_pred
# Sort the customer ID according probabilities of being fraud # Lowest to highest
y_pred = y_pred[y_pred[:, 1].argsort()]
# First column is the customerID and the last column is the probabilities of being a fraud # From lowest to highest
y_pred
The bank should investigate this file of data of customer and cross check if this customer is indeed a fraud or not. It can be observed that Self-Organizing Map is efficient in finding unique features or features that can't be easily identified.