``# Customer Complaint Analyses Insights into issues plaguing the banking sector
Authors
@abhaar @avinsrid @nachirau @ss91
Motivation
- One of the biggest challenges for banks is minimizing customer attrition rate which is directly dependent on customer satisfaction.
- Customers are inclined to choose the banks who can be trusted for their services.
- Banks make their decisions based on a subset of data because of absence of scalable solutions.
In this project, we propose a scalable design to counter the above problems!
Our Work
- Classification
- Performance Metric Analyses
- Data Analysis using Hive
What have we achieved?
- Used machine learning libraries such as Apache Mahout to perform classifications on raw data sets for banks and states to ensure they have better understanding of customer sentiments.
- Performed data analysis on the data sets using Hive, to give a detailed overview of the banks’ performance from a customer sentiment perspective.
- Developed a novel metric system that assigns priorities to customers’ complaints. This helps banks prioritize customers’ problems on specific constraints such as response time, etc.
With our new metric system, banks can relatively prioritize the complaints to resolve!
DEMO of our Work
Compilation Instructions
- Classification [Note: Ensure Hadoop 2.5.1, Maven are installed. Mandatory for build]
$ git clone https://github.com/Sapphirine/Customer-Complaint-Analyses.git
$ cd Customer-Complaint-Analyses/PROJECT_CODE/
$ mvn clean install
$ hadoop jar target/Classification-Files-Big-Data-Project-1.0.jar com.bigdata.complaintanalysis.ClassificationAutomator data/Consumer_Complaints.csv
Sequenced files will be stored in HDFS under classification directory
$ hdfs dfs -ls data/classification/$state_name
Execute Mahout Naive Bayes Classification
$ MAHOUT_PATH/bin/mahout seq2sparse -i data/classifiaction/$state_name -o $state_name-vectors
$ MAHOUT_PATH/bin/mahout split -i $state_name-vectors/tfidf-vectors --trainingOutput train-vectors --testOutput test-vectors --randomSelectionPct 40 --overwrite --sequenceFiles -xm sequential
$ MAHOUT_PATH/bin/mahout trainnb -i train-vectors -el -li labelindex -o model -ow -c
$ MAHOUT_PATH/bin/mahout testnb -i train-vectors -m model -l labelindex -ow -o $state_name-testing -c
$ MAHOUT_PATH/bin/mahout testnb -i test-vectors -m model -l labelindex -ow -o $state_name-testing -c
View the Confusion Matrix
- Performance Metric Analysis [NOTE: Please download the required CsvReader jar file from the following link http://javacsv.sourceforge.net/com/csvreader/CsvReader.html
$ cd Customer-Complaint-Analyses/
$ javac -cp /path/to/jar ProblemClustering.java
$ java -cp /path/to/jar ProblemClustering
- Data Analysis using Hive Click Here!
Future Work
- Resolution Methodology Recommender
- Build a recommender engine that can derive the best “first response” for a complaint
- More rigorous data analysis and research into complaint resolution methodologies required
- Quantitatively gauge impact of various classes of complaints on various products to gain insights into customer outlook towards a specific product
Contact Us
Feel free to shoot the authors an email at the following email IDs: