Customer-Complaint-Analyses by avinsrid

``# Customer Complaint Analyses Insights into issues plaguing the banking sector

Authors

@abhaar @avinsrid @nachirau @ss91

Motivation

One of the biggest challenges for banks is minimizing customer attrition rate which is directly dependent on customer satisfaction.
Customers are inclined to choose the banks who can be trusted for their services.
Banks make their decisions based on a subset of data because of absence of scalable solutions.

In this project, we propose a scalable design to counter the above problems!

Our Work

Classification
Performance Metric Analyses
Data Analysis using Hive

What have we achieved?

Used machine learning libraries such as Apache Mahout to perform classifications on raw data sets for banks and states to ensure they have better understanding of customer sentiments.
Performed data analysis on the data sets using Hive, to give a detailed overview of the banks’ performance from a customer sentiment perspective.
Developed a novel metric system that assigns priorities to customers’ complaints. This helps banks prioritize customers’ problems on specific constraints such as response time, etc.

With our new metric system, banks can relatively prioritize the complaints to resolve!

DEMO of our Work

Compilation Instructions

Classification [Note: Ensure Hadoop 2.5.1, Maven are installed. Mandatory for build]

$ git clone https://github.com/Sapphirine/Customer-Complaint-Analyses.git
$ cd Customer-Complaint-Analyses/PROJECT_CODE/
$ mvn clean install
$ hadoop jar target/Classification-Files-Big-Data-Project-1.0.jar com.bigdata.complaintanalysis.ClassificationAutomator data/Consumer_Complaints.csv

Sequenced files will be stored in HDFS under classification directory

$ hdfs dfs -ls data/classification/$state_name

Execute Mahout Naive Bayes Classification

$ MAHOUT_PATH/bin/mahout seq2sparse -i data/classifiaction/$state_name -o $state_name-vectors
$ MAHOUT_PATH/bin/mahout split -i $state_name-vectors/tfidf-vectors --trainingOutput train-vectors --testOutput test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
$ MAHOUT_PATH/bin/mahout trainnb -i train-vectors -el -li labelindex -o model -ow -c
$ MAHOUT_PATH/bin/mahout testnb -i train-vectors -m model -l labelindex -ow -o $state_name-testing -c
$ MAHOUT_PATH/bin/mahout  testnb -i test-vectors -m model -l labelindex -ow -o $state_name-testing -c

View the Confusion Matrix

Performance Metric Analysis [NOTE: Please download the required CsvReader jar file from the following link http://javacsv.sourceforge.net/com/csvreader/CsvReader.html

$ cd Customer-Complaint-Analyses/
$ javac -cp /path/to/jar ProblemClustering.java
$ java -cp /path/to/jar ProblemClustering

Data Analysis using Hive Click Here!

Future Work

Resolution Methodology Recommender
1. Build a recommender engine that can derive the best “first response” for a complaint
2. More rigorous data analysis and research into complaint resolution methodologies required
Quantitatively gauge impact of various classes of complaints on various products to gain insights into customer outlook towards a specific product

Contact Us

Feel free to shoot the authors an email at the following email IDs: