Case Study: Self Organizing Maps In Fraud Detection¶

Introduction¶

Self-Organizing Map or self-organizing feature map (SOFM) algorithm can be use to create a model of typical cardholder's behavior and to analyze the deviation of transactions, thus finding suspicious transactions. This a type of artificial neural network (ANN) that is trained using unsupervised learning.

Problem:

Create self organizing map and then implement artificial neural network to find fraudulent cases

Dataset:

The dataset contains 16 anonymized variables and a class variable. The variables are anonymized to protect the privacy of the customers as the dataset is in the public domain. The dataset can be found here. ‘0’ as target variable corresponds to the non-fraudulent cases whereas ‘1’ in target variable corresponds to fraudulent cases.

Source: https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

Review¶

An illustration of the training of a self-organizing map. The blue blob is the distribution of the training data, and the small white disc is the current training datum drawn from that distribution. At first (left) the SOM nodes are arbitrarily positioned in the data space. The node (highlighted in yellow) which is nearest to the training datum is selected. It is moved towards the training datum, as (to a lesser extent) are its neighbors on the grid. After many iterations the grid tends to approximate the data distribution (right).

Like most artificial neural networks, SOMs operate in two modes: training and mapping. "Training" builds the map using input examples (a competitive process, also called vector quantization), while "mapping" automatically classifies a new input vector.

Import Libraries And Data¶

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Credit_Card_Applications.csv')

# Check dataset
dataset.head()

Create Self-Organizing Maps¶

Data Preprocessing¶

# Label the array values of independent and dependent variables.
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

# Check dimension
X.shape, y.shape

((690, 15), (690,))

# Feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
X = sc.fit_transform(X)

Model Training¶

# Training the SOM
from minisom import MiniSom
som = MiniSom(x = 10, y = 10, input_len = 15, sigma = 1.0, learning_rate = 0.5) #Sigma = radius
som.random_weights_init(X)
som.train_random(data = X, num_iteration = 100)

Model Visualization¶

# Visualizing the results
from pylab import bone, pcolor, colorbar, plot, show
bone()
pcolor(som.distance_map().T)
colorbar()
markers = ['o', 's']
colors = ['r', 'g']
for i, x in enumerate(X):
    w = som.winner(x)
    plot(w[0] + 0.5,
         w[1] + 0.5,
         markers[y[i]],
         markeredgecolor = colors[y[i]],
         markerfacecolor = 'None',
         markersize = 10,
         markeredgewidth = 2)
show()

Create Artificial Neural Network¶

Data Preprocessing¶

# Finding the frauds
mappings = som.win_map(X)
frauds = np.concatenate((mappings[(7,1)], mappings[(9,1)]), axis = 0)
frauds = sc.inverse_transform(frauds)

# Creating the matrix features
customers = dataset.iloc[:, 1:].values

# Creating the dependable variable
is_fraud = np.zeros(len(dataset))
for i in range(len(dataset)):
    if dataset.iloc[i,0] in frauds:
        is_frauds[i] = 1

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
customers = sc.fit_transform(customers)

# Check dimension
customers.shape, is_fraud.shape

((690, 15), (690,))

Create Model Architecture¶

# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense

# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 2, kernel_initializer = 'uniform', activation = 'relu', input_dim = 15))

# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(customers, is_fraud, batch_size = 1, epochs = 2)

Epoch 1/2
690/690 [==============================] - 1s 1ms/step - loss: 0.4433 - accuracy: 1.0000
Epoch 2/2
690/690 [==============================] - 1s 1ms/step - loss: 0.0933 - accuracy: 1.0000

<tensorflow.python.keras.callbacks.History at 0x2399f6f53a0>

Making Predictions And Evaluating The Model¶

# Predicting the Test set results
y_pred = classifier.predict(customers)

# concatenate customerID with the probability values
y_pred = np.concatenate((dataset.iloc[:, 0:1].values, y_pred), axis = 1)
y_pred

array([[1.57761560e+07, 4.54278290e-02],
       [1.57395480e+07, 3.25773954e-02],
       [1.56628540e+07, 5.87960482e-02],
       ...,
       [1.56754500e+07, 2.30326653e-02],
       [1.57764940e+07, 1.19131533e-02],
       [1.55924120e+07, 6.10088110e-02]])

# Sort the customer ID according probabilities of being fraud # Lowest to highest
y_pred  = y_pred[y_pred[:, 1].argsort()]

# First column is the customerID and the last column is the probabilities of being a fraud # From lowest to highest
y_pred

array([[1.57997850e+07, 2.55444646e-03],
       [1.56548590e+07, 4.91350889e-03],
       [1.55858550e+07, 6.42305613e-03],
       ...,
       [1.57355720e+07, 1.11779273e-01],
       [1.57101380e+07, 1.42052442e-01],
       [1.55941330e+07, 1.44720882e-01]])

Conclusion¶

The bank should investigate this file of data of customer and cross check if this customer is indeed a fraud or not. It can be observed that Self-Organizing Map is efficient in finding unique features or features that can't be easily identified.

	CustomerID	A1	A2	A3	A4	A5	A6	A7	A8	A9	A10	A11	A12	A13	A14	Class
0	15776156	1	22.08	11.46	2	4	4	1.585	0	0	0	1	2	100	1213	0
1	15739548	0	22.67	7.00	2	8	4	0.165	0	0	0	0	2	160	1	0
2	15662854	0	29.58	1.75	1	4	4	1.250	0	0	0	1	2	280	1	0
3	15687688	0	21.67	11.50	1	5	3	0.000	1	1	11	1	2	0	1	1
4	15715750	1	20.17	8.17	2	6	4	1.960	1	1	14	0	2	60	159	1

Case Study: Self Organizing Maps In Fraud Detection¶

Table of Contents

Introduction¶

Review¶

Import Libraries And Data¶

Create Self-Organizing Maps¶

Data Preprocessing¶

Model Training¶

Model Visualization¶

Create Artificial Neural Network¶

Data Preprocessing¶

Create Model Architecture¶

Making Predictions And Evaluating The Model¶

Conclusion¶