Consumer Behaviour Analysis
Proposal
Load required packages
Goal
Our main motivation for selecting the dataset Amazon consumer Behaviour Dataset that we came across in Kaggle was to unveil some customer insights which can be used for enhancing the customer experience or the improving the business implementation after analyzing this dataset.
Dataset
This is a dataset collected from kaggle for analyzing the behavioral analysis of Amazon’s consumers consists of a comprehensive collection of customer interactions, browsing patterns within the Amazon ecosystem. It includes a wide range of variables such as customer demographics, user interaction, and reviews. The dataset aims to provide insights into customer preferences, shopping habits, and decision-making processes on the Amazon platform. By analyzing this dataset, researchers and analysts can gain a deeper understanding of consumer behavior, identify trends, optimize marketing strategies, and improve the overall customer experience on Amazon. The Dataset contains N=602
observations.
Examine data
Using dlookr’s describe()
and diagnose()
- some basic EDA
described_variables | n | na | mean | sd | se_mean | IQR | skewness | kurtosis | p00 | p01 | p05 | p10 | p20 | p25 | p30 | p40 | p50 | p60 | p70 | p75 | p80 | p90 | p95 | p99 | p100 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
age | 602 | 0 | 30.790698 | 10.1932760 | 0.41544691 | 13 | 1.0078573 | 0.5563217 | 3 | 16 | 21 | 22 | 23 | 23 | 23 | 25 | 26 | 32 | 34 | 36 | 39.8 | 45 | 50 | 60 | 67 |
Customer_Reviews_Importance | 602 | 0 | 2.480066 | 1.1852257 | 0.04830619 | 2 | 0.3033064 | -0.7098681 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 3 | 3 | 3 | 3 | 3.0 | 4 | 5 | 5 | 5 |
Personalized_Recommendation_Frequency…18 | 602 | 0 | 2.699336 | 1.0420284 | 0.04246991 | 1 | 0.2272009 | -0.2877537 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3.0 | 4 | 5 | 5 | 5 |
Rating_Accuracy | 602 | 0 | 2.672757 | 0.8997441 | 0.03667083 | 1 | 0.1827622 | 0.2772995 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 3.0 | 4 | 4 | 5 | 5 |
Shopping_Satisfaction | 602 | 0 | 2.463455 | 1.0121525 | 0.04125225 | 1 | 0.2785844 | -0.4033100 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3.0 | 4 | 4 | 5 | 5 |
variables | types | missing_count | missing_percent | unique_count | unique_rate |
---|---|---|---|---|---|
age | numeric | 0 | 0.0000000 | 50 | 0.083056478 |
Gender | character | 0 | 0.0000000 | 4 | 0.006644518 |
Purchase_Frequency | character | 0 | 0.0000000 | 5 | 0.008305648 |
Purchase_Categories | character | 0 | 0.0000000 | 29 | 0.048172757 |
Personalized_Recommendation_Frequency…6 | character | 0 | 0.0000000 | 3 | 0.004983389 |
Browsing_Frequency | character | 0 | 0.0000000 | 4 | 0.006644518 |
Product_Search_Method | character | 2 | 0.3322259 | 5 | 0.008305648 |
Search_Result_Exploration | character | 0 | 0.0000000 | 2 | 0.003322259 |
Customer_Reviews_Importance | numeric | 0 | 0.0000000 | 5 | 0.008305648 |
Add_to_Cart_Browsing | character | 0 | 0.0000000 | 3 | 0.004983389 |
Cart_Completion_Frequency | character | 0 | 0.0000000 | 5 | 0.008305648 |
Cart_Abandonment_Factors | character | 0 | 0.0000000 | 4 | 0.006644518 |
Saveforlater_Frequency | character | 0 | 0.0000000 | 5 | 0.008305648 |
Review_Left | character | 0 | 0.0000000 | 2 | 0.003322259 |
Review_Reliability | character | 0 | 0.0000000 | 5 | 0.008305648 |
Review_Helpfulness | character | 0 | 0.0000000 | 3 | 0.004983389 |
Personalized_Recommendation_Frequency…18 | numeric | 0 | 0.0000000 | 5 | 0.008305648 |
Recommendation_Helpfulness | character | 0 | 0.0000000 | 3 | 0.004983389 |
Rating_Accuracy | numeric | 0 | 0.0000000 | 5 | 0.008305648 |
Shopping_Satisfaction | numeric | 0 | 0.0000000 | 5 | 0.008305648 |
Service_Appreciation | character | 0 | 0.0000000 | 8 | 0.013289037 |
Improvement_Areas | character | 0 | 0.0000000 | 18 | 0.029900332 |
Checking the number of rows and columns with nrow
and ncol
:
- So we have totally 602 data points in the Amazon Consumer behavior dataset. One important point to note here is that some rows contain multiple entries for
- And we have 22 columns in the dataset.
Categorical variable summary
Using gtsummary
for table summary (tbl_summary()
)of selected categorical columns:
Code
Characteristic | N = 6021 |
---|---|
Browsing_Frequency | |
Few times a month | 199 (33%) |
Few times a week | 249 (41%) |
Multiple times a day | 77 (13%) |
Rarely | 77 (13%) |
Purchase_Frequency | |
Few times a month | 203 (34%) |
Less than once a month | 124 (21%) |
Multiple times a week | 56 (9.3%) |
Once a month | 107 (18%) |
Once a week | 112 (19%) |
Purchase_Categories | |
Beauty and Personal Care | 106 (18%) |
Beauty and Personal Care;Clothing and Fashion | 46 (7.6%) |
Beauty and Personal Care;Clothing and Fashion;Home and Kitchen | 42 (7.0%) |
Beauty and Personal Care;Clothing and Fashion;Home and Kitchen;others | 8 (1.3%) |
Beauty and Personal Care;Clothing and Fashion;others | 12 (2.0%) |
Beauty and Personal Care;Home and Kitchen | 21 (3.5%) |
Beauty and Personal Care;Home and Kitchen;others | 5 (0.8%) |
Beauty and Personal Care;others | 7 (1.2%) |
Clothing and Fashion | 106 (18%) |
Clothing and Fashion;Home and Kitchen | 27 (4.5%) |
Clothing and Fashion;Home and Kitchen;others | 16 (2.7%) |
Clothing and Fashion;others | 14 (2.3%) |
Groceries and Gourmet Food | 14 (2.3%) |
Groceries and Gourmet Food;Beauty and Personal Care | 7 (1.2%) |
Groceries and Gourmet Food;Beauty and Personal Care;Clothing and Fashion | 10 (1.7%) |
Groceries and Gourmet Food;Beauty and Personal Care;Clothing and Fashion;Home and Kitchen | 14 (2.3%) |
Groceries and Gourmet Food;Beauty and Personal Care;Clothing and Fashion;Home and Kitchen;others | 32 (5.3%) |
Groceries and Gourmet Food;Beauty and Personal Care;Clothing and Fashion;others | 1 (0.2%) |
Groceries and Gourmet Food;Beauty and Personal Care;Home and Kitchen | 4 (0.7%) |
Groceries and Gourmet Food;Beauty and Personal Care;others | 3 (0.5%) |
Groceries and Gourmet Food;Clothing and Fashion | 6 (1.0%) |
Groceries and Gourmet Food;Clothing and Fashion;Home and Kitchen | 4 (0.7%) |
Groceries and Gourmet Food;Clothing and Fashion;Home and Kitchen;others | 3 (0.5%) |
Groceries and Gourmet Food;Clothing and Fashion;others | 2 (0.3%) |
Groceries and Gourmet Food;Home and Kitchen | 5 (0.8%) |
Groceries and Gourmet Food;Home and Kitchen;others | 6 (1.0%) |
Home and Kitchen | 24 (4.0%) |
Home and Kitchen;others | 9 (1.5%) |
others | 48 (8.0%) |
1 n (%) |
Questions
Question 1
In our first question “What are the factors influencing the customer’s decision to abandon a purchase in their cart on Amazon?” we are attempting to understand the reasons behind the customer abandoning the purchase in their cart for increasing the conversion rate(the percentage of users who actually complete a purchase) for amazon. It will also help us in enhancing the customer experience by making the application or the website more user-friendly and intuitive so that the user is able to find the right product and proceed to complete his purchase in an effortless manner.
Question 2
For our second question “Which demographic (on the basis of gender and age) is most likely to purchase a particular product category?” we attempt to determine the demographic which is most likely to purchase a particular product category on the basis of their age and gender which will help companies to tailor their marketing strategies so that their messages are able to reach the right group of customers, leading to cost optimization of their marketing budget. By identifying the right demographic to target amazon can gain a competitive advantage by attracting a larger share of the target audience. This will also lead to higher levels of customer satisfaction which will result in customer loyalty and recurring purchases.
Analysis plan
Approach for question 1
Our analysis will commence with a precise exploratory data analysis (EDA) using R, where we’ll focus on key variables such as Purchase_Frequency
, Product_Search_Method
, and Customer_Reviews_Importance
, to unearth their potential impact on cart abandonment. During the data cleaning stage, we will ensure that columns like age
, Gender
, and Browsing_Frequency
are correctly formatted and free of missing values. Subsequently, statistical techniques—such as logistic regression or decision trees—will be applied to assess the influence of factors like Personalized_Recommendation_Frequency
and Search_Result_Exploration
on cart abandonment behavior.
For visualization, we will harness the capabilities of R’s ggplot2 package to create comprehensive and interpretable graphics, such as correlation heat-maps and bar charts, showcasing the interplay between cart abandonment and various customer behavior metrics. This meticulous approach is designed to provide us with robust insights, empowering us to enhance the shopping experience and curtail cart abandonment rates effectively.
Approach for question 2
We will further continue our analysis where we will be focusing majorly on the key features such as age
, Gender
, Purchase_Categories
and Cart_Completion_Frequency
to predict which product is of particular interest to a demography. Again, we will make sure the data is clean for the concerned features and address any missing or incorrect data.
Post that we will identify important summary statistics for key variables such as Purchase_Categories
. Once the data is clean, we will employ Statistical Analysis tools to assess the likelihood of purchasing product categories based on demographic variables (age and gender). For visualization, we will be using heat-maps and bar charts to illustrate the relationship between the target variables.
Variables of focus for both questions
Variable | Description |
---|---|
Purchase_Frequency | How frequently does the user make purchases on Amazon? |
Product_Search_Method | How does the user search for products on Amazon? |
Customer_Reviews_Importance | How important are customer reviews in users decision-making process? |
age | age |
Gender | gender |
Browsing_Frequency | How often does the user browse Amazon’s website or app? |
Purchase_Categories | What product categories does the user typically purchase on Amazon? |
Cart_Completion_Frequency | How often do user complete the purchase after adding products to their cart? |
Personalized_Recommendation_Frequency | How often do user receive personalized product recommendations from Amazon? |
Search_Result_Exploration | Does the user tend to explore multiple pages of search results or focus on the first page? |
Organization
Plan of Attack
Week | Weekly Tasks | Persons in Charge | Backup |
---|---|---|---|
until November 8th | Explore and finalize the dataset and the problem statements | Everyone | Everyone |
- | Complete the proposal and assign some high-level tasks | Everyone | Everyone |
November 9th to 15th | Exploratory Data Analysis | Shashwat | Bharath |
- | Data cleaning and Data pre-processing based on EDA | Bharath | Pranshu |
- | Question specific exploration and identify initial trends and patterns | Joel | Vishal |
November 16th to 22nd | Model training for Q1 | Vishal | Shashwat |
- | Model training for Q2 | Pranshu | Joel |
November 23rd to 29th | Continue Model training and testing for Q1 and Q2 | Vishal | Pranshu |
- | Improving the models if there is a need | Joel | Bharath |
November 30th to December 6th | Refining the code for code review with comments | Bharath | Vishal |
- | Generate insights from the model output | Shashwat | Joel |
December 7th to 13th | Review the generated models | Pranshu | Shashwat |
- | Write-up and presentation for the project | Everyone | Everyone |
Repo Organization
The following are the folders involved in the Project repository.
‘data/’: Used for storing any necessary data files for the project, such as input files.
‘images/’: Used for storing image files used in the project.
‘presentation_files/’: Folder for having presentation related files.
‘_extra/’: Used to brainstorm our analysis which won’t impact our project workflow.
‘_freeze/’: This folder is used to store the generated files during the build process. These files represent the frozen state of the website at a specific point in time.
‘_site/’: Folder used to store the generated static website files after the site generator processes the quarto document.
‘.github/’: Folder for storing github templates and workflow.
We will be creating few folders inside images/
folder for storing question specific images and presentation related images which are generated through out the plot. We will be creating images/Q1
, images/Q2
and images/Presentation
for those respective files.