Consumer Behaviour Analysis

INFO 523 - Project Final

Author
Affiliation
Pattern Pioneers - Vishal, Joel, Pranshu, Shashwat, Bharath

School of Information, University of Arizona

Abstract

This project aims to conduct a comprehensive analysis of consumer behavior using the Amazon Consumer Behavior Dataset obtained from Kaggle. By delving into the intricacies of customer interactions, browsing patterns, and reviews, our goal is to unearth nuanced insights that can be harnessed to not only enhance the customer experience but also to strategically optimize business approaches. Leveraging a combination of statistical analyses and advanced visualization techniques, this study aims to provide actionable intelligence for businesses navigating the ever-evolving landscape of e-commerce.

Introduction

In the fast-paced world of online shopping, understanding what customers want is crucial for success. The Amazon Consumer Behavior Dataset is like a goldmine of information about how people shop online. It’s not just numbers; it tells a story about how and why people make choices when shopping on the internet. It goes beyond just purchases, giving us a complete picture of how customers interact and move through their online shopping experience. This analysis focuses on two main things: figuring out why some people leave their shopping carts without buying anything and understanding which groups of people prefer different types of products. The goal is to provide useful information to businesses so they can make smart decisions in the complex world of online commerce.

Question 1: What are the factors influencing the customer’s decision to abandon a purchase in their cart on Amazon?

The first question addresses the factors influencing a customer’s decision to abandon a purchase in their cart on Amazon. The analysis commences with Exploratory Data Analysis (EDA) in R, focusing on key variables such as Purchase Frequency, Product Search Method, and Customer Reviews Importance.

Approach

Our analysis begins with precise exploratory data analysis (EDA) in R, focusing on key variables like Purchase_Frequency, Product_Search_Method, and Customer_Reviews_Importance to understand their impact on cart abandonment. During data cleaning, we ensure proper formatting and address missing values in age, Gender, and Browsing_Frequency. We then apply statistical techniques like neural networks and decision trees to assess the influence of factors such as Personalized_Recommendation_Frequency and Search_Result_Exploration on cart abandonment. Visualization utilizes ggplot2 for creating comprehensive graphics, including correlation heat-maps and bar charts, to reveal insights into the interplay between cart abandonment and customer behavior metrics, aiming to enhance the shopping experience and reduce cart abandonment rates effectively.

Analysis

Our objective is to investigate and pinpoint potential reasons for cart abandonment using the data at hand. Initially, we examined outliers and conducted a preliminary analysis to extract insights from various feature columns. Our exploration covered improvement areas, purchase frequency, and the age distribution of the dataset. Among the available features, we selected purchase frequency, product search method, customer reviews importance, and cart abandonment factors to explore their correlations and identify patterns related to cart abandonment. Analyzing the correlation plot, we observed that product search method exhibits a stronger correlation with cart abandonment factors compared to other features. Subsequently, we utilized this feature to uncover insights using methods such as nnet and decision tree.

Despite the diligent efforts we achieved accuracy of 43 percent indicates that the model’s performance may be limited by various factors. Potential reasons for the performance could include but are not limited to:

Limited Data: The dataset is small or insufficiently representative to capture the complexity of the problem.

Model Complexity: The chosen neural network architecture may not be suitable for the problem at hand. Experimentation with different architectures could be beneficial.

Feature Engineering: Feature selection or engineering techniques might need further exploration to improve the model’s ability to learn relevant patterns.

Moving forward, efforts will be focused on refining the model architecture, exploring different neural network configurations, experimenting with additional features, and potentially acquiring more data to improve the model’s accuracy. This result, while not meeting the desired performance threshold, serves as a valuable starting point for further iterations and enhancements to achieve better predictive capabilities.

Question 2: Which demographic (on the basis of gender and age) is most likely to purchase a particular product category?

The second question focuses on determining which demographic (based on gender and age) is most likely to purchase a particular product category on Amazon. The team follows a structured approach involving data cleaning, summary statistics, visualization, and statistical modeling using logistic regression to predict the likelihood of purchasing different product categories based on demographic variables.

Approach

The approach that we took to solve determine which demographic is most likely to purchase a particular product category is as follows:

  • Data Cleaning: We checked the data set for any inconsistencies like missing values or incorrect data. Then we tried to handle the outliers if they needed to be removed.

  • Summary statistics: After data cleaning we performed the summary statistics for important columns like purchase category, age group, gender which helped us understand the characteristics and the distribution of data.

  • Visualization: After a few steps of data transformation which included splitting the rows having multiple purchase categories into multiple rows with same data in other columns we performed some visualizations using ggplot to represent the data with the bar plots which helped us to figure out some trends and patterns of our data.

  • Modelling: In the final part we performed statistical modelling to assess the likelihood of purchasing different product categories based on demographic variables like age and gender. We achieved this goal by using logistic regression models to predict the probability of a particular demographic to purchase each category.

Analysis

The analysis of the relationship between age, gender and different purchase categories produces a lot of benefits for companies. It provides a deeper understanding of the preferences and behaviors of the customer, which allows the companies to make informed decisions on their product development, pricing and positioning. It also guides the companies in developing a targeted marketing campaign to reach a specific demographic. By applying the logistic regression we achieved an accuracy of 72.6% to predict the product category a particular demographic is likely to be interested in buying.

Discussion

Companies need to know who is most likely to buy their product so they can advertise to them effectively. By understanding the age and gender of their customers, companies can save money by focusing their marketing on the right people. This can help them beat their competition by getting more of the people they want to buy their product. This will also enhance the user experience of the customers which will lead to higher levels of customer satisfaction which will result in cost optimization of the companies marketing budget, customer loyalty and recurring purchases.