Descriptive Analysis of Grade Outcomes

Proposal Final Project

Author

The Null Wranglers Team

Installed Packages
# Load Libraries
library(knitr)
if (!require(pacman))
  install.packages(pacman)

pacman::p_load(tidyverse,   # Data wrangling
               dlookr,      # Exploratory Data Analysis
               formattable, # Present neat table format
               gt,          # Alternating formatting for the tables
               gtsummary)   # Summary for the tables

Introduction

Low student success rates have a profound impact on an educational institution’s academic offerings, their ability to provide financial support and scholarships, as well as their overall professional reputation. Much is unknown about how the corona-virus pandemic has impacted student success in higher education. Thus, in order to better understand this impact, this study seeks to conduct a detailed examination of the grade value outcomes during this time. By examining the grade value outcomes for the years before, during and after the pandemic, patterns and trends may be discovered that might help guide student support services in the future. The analysis will breakdown the grade values outcomes by college, department, division level and course level using clustering and classification techniques. Anomalies will be isolated for a more fine grained inspection that includes course attributes. These course attributes will be used in an effort to create a model for predictive grade value outcomes.

Goals

The high-level goal of this study is to analyze and understand the grade outcomes at the University of Arizona (UofA) pre-, during, and after the COVID-19 pandemic. This investigation revolves around two primary questions:

  1. Pattern Analysis – What if any patterns or relationships exist between colleges, departments, or course level and grade outcomes for enrolled students?

  2. Predictive Modeling – Can a predictive model be developed that accurately forecasts grade outcomes based on course attributes?

Dataset

The first data set used for this analysis, is known as the DEW Rates, contains information about the breakdown of grade value outcomes during the academic years of 2018-19, 2019-20, and 2020-21 at the University of Arizona. The consists of 35697 observations of courses with 23 variables related to enrollment and grade values. Below is a description of the data:

Grade Outcomes Data
Variable Type Description
College character Identifies the college at the University of Arizona
Department character Identifies which department within given college
Subject.Code character Identifies the subject of the course with a 3 or 4 letter code
Catalog.Number character Identifies the catalog number of the course
Course.Description character Identifies the title or the course
Course.Level character Identifies if the course is lower, upper or graduate division
Total.Student.Count Integer The total enrollment for the course
D_GRADE_COUNT Integer The number of enrolled students that earned a D grade in the course
FAIL_GRADE_COUNT Integer The number of enrolled students that earned a E(Fail) grade in the course
WITHDRAW_GRADE_COUNT Integer The number of enrolled students that withdrew from the course
DEW_COUNT Integer The total of D, E, and W grades earned by enrolled students in the course
PASS_GRADE_COUNT Integer The number of enrolled students that received a C or greater grade value
WITHDRAW_FULLMED_GRADE_COUNT Integer The number of enrolled students that took a full medical withdrawal
INCOMPLETE_UNGRADED_COUNT Integer The number of enrolled students that did not receive grades
TERM_LD character The semester that the course was offered
ACAD_YR_SID Integer The academic year that the course was offered
Percent.D.Grade numeric The percentage of enrolled students that received a D grade in the course
Percent.E.Grade numeric The percentage of enrolled students that received a E grade in the course
Percent.W.Grade numeric The percentage of enrolled students that received a W grade in the course
Percent.DEW numeric The percentage of enrolled students that received a D, E, or W grade in the courseThe percentage of enrolled students that received a D, E, or W grade in the course
Percent.Passed numeric The percentage of enrolled students that received a C or greater grade value in the course
Per.Full..Medical.Withdrawal numeric The percentage of enrolled students that had a full medical withdraw in the course
Per.Ungraded..Incomplete numeric The percentage of enrolled students that did not receive a grade value in the course
Load Data 1
# read in data
dew_counts <- read.csv("data/DEW Rates.csv")

kable(head(dew_counts))
College Department Subject.Code Catalog.Number Course.Description Course.Level Total.Student.Count D_GRADE_COUNT FAIL_GRADE_COUNT WITHDRAW_GRADE_COUNT DEW_COUNT PASS_GRADE_COUNT WITHDRAW_FULLMED_GRADE_COUNT INCOMPLETE_UNGRADED_COUNT TERM_LD ACAD_YR_SID Percent.D.Grade Percent.E.Grade Percent.W.Grade Percent.DEW Percent.Passed Per.Full..Medical.Withdrawal Per.Ungraded..Incomplete
College of Engineering Aerospace & Mechanical Engr ABE 489A Fab Tech Micro+Nanodevic Upper Division 10 0 0 0 0 10 0 0 Fall 2018 2019 0.0 0.0 0.0 0.0 100.0 0.0 0
College of Engineering Aerospace & Mechanical Engr AME 105 Introduction to MATLAB I Lower Division 196 17 2 1 20 173 3 0 Fall 2018 2019 8.7 1.0 0.5 10.2 88.3 1.5 0
College of Engineering Aerospace & Mechanical Engr AME 105 Introduction to MATLAB I Lower Division 160 22 17 3 42 115 3 0 Fall 2019 2020 13.8 10.6 1.9 26.3 71.9 1.9 0
College of Engineering Aerospace & Mechanical Engr AME 105 Introduction to MATLAB I Lower Division 111 11 4 5 20 82 9 0 Fall 2020 2021 9.9 3.6 4.5 18.0 73.9 8.1 0
College of Engineering Aerospace & Mechanical Engr AME 105 Introduction to MATLAB I Lower Division 254 11 5 6 22 231 1 0 Spring 2019 2019 4.3 2.0 2.4 8.7 90.9 0.4 0
College of Engineering Aerospace & Mechanical Engr AME 105 Introduction to MATLAB I Lower Division 213 3 9 19 31 182 0 0 Spring 2020 2020 1.4 4.2 8.9 14.6 85.4 0.0 0

The second data set used for this analysis, is known as the SY20 and SY21, contains information about course attributes for the academic years of 2019-20, and 2020-21 at the University of Arizona. Unfortunately the academic year of 2018-19 was unable to obtained for this study. There are over 260,000 observations of courses with 27 variable per observation.

Course Schedule Overview
Variable Type Description
Term Character The term the course was offered
Campus Character The campus the course was offered at
Session Character The type of session for the course - regular, first 7 weeks, etc,
Subject Character The 3 or 4 letter subject code of the course
Cat . . Character The course catalog code
Section Integer The section number of the course
Class . . Integer The class identifier code
Start.Date Date The start date of the course
End.Date Date The end of the course
Meet . . Integer The number of meetings for each course
Req.Desig Character A course type identifier
P.F. Opt Character A flag that indicates if the course was a P or F option
Component Character Category/Type of course
Units Integer The number of units of the course
Min.Units Integer The minimum number of units allowed for the course
Max.Units Integer The maximum number of units allowed for the course
Course Character The name of the course
Combined.Section Character Identifier if the course is combined with another course
Meeting.Days Character The list of days the courses meets on
Start Character The start time of day that the course meets
End Character The end time of day that the course meets
Facility Character The location or room of the course meetings
Total.Enroll Integer The total number of students enrolled
Max.Enroll Integer The maximum number of students allowed to enrolled
Rm.Cap Integer The maximum number of students allowed in the meeting space
Enrollment.Status Integer Identifier if the course was an open or closed enrollment
Mode Character The modelity of the course
Load Data 2
# read in two csv files
sy20_data <- read.csv("data/Course Schedule Overview - SY20.csv")

kable(head(sy20_data))
Term Campus Session Subject Cat.. Section Class.. Start.Date End.Date Meet.. Req.Desig P.F.Opt Component Units Min.Units Max.Units Course Combined.Section Meeting.Days Start End Facility Total.Enroll Max.Enroll Rm.Cap Enrollment.Status Mode
Fall 2019 MAIN Regular Academic Session ABS 593A 1 45315 8/26/2019 12/11/2019 1 - Ind Study 1 1 9 Internship in Applied Biosci - 12:00 AM 12:00 AM NA 0 0 0 Closed In Person
Fall 2019 MAIN Regular Academic Session ABS 593A 2 45394 8/26/2019 12/11/2019 1 - Ind Study 1 1 9 Internship in Applied Biosci - 12:00 AM 12:00 AM NA 1 5 0 Open In Person
Fall 2019 MAIN Regular Academic Session ABS 593A 3 45395 8/26/2019 12/11/2019 1 - Ind Study 1 1 9 Internship in Applied Biosci - 12:00 AM 12:00 AM NA 0 5 0 Open In Person
Fall 2019 MAIN Regular Academic Session ABS 593A 4 45397 8/26/2019 12/11/2019 1 - Ind Study 1 1 9 Internship in Applied Biosci - 12:00 AM 12:00 AM NA 1 5 0 Open In Person
Fall 2019 MAIN Regular Academic Session ABS 593A 5 45398 8/26/2019 12/11/2019 1 - Ind Study 1 1 9 Internship in Applied Biosci - 12:00 AM 12:00 AM NA 0 5 0 Open In Person
Fall 2019 MAIN Regular Academic Session ABS 593A 6 47657 8/26/2019 12/11/2019 1 - Ind Study 1 1 9 Internship in Applied Biosci - 12:00 AM 12:00 AM NA 0 5 0 Open In Person

Exploratory Question

What patterns and variations exist in grade value outcomes across colleges, departments, and course levels during and after the COVID-19 pandemic?

Predictive Question

How can course attributes associated with poor grade outcomes during this time pandemic predict future grade values for similar courses?

Analysis plan

Data Collection

  • Historical Grade Data: Collect historical grade data from the University of Arizona for the years before, during, and after the COVID-19 pandemic. This data should include information on grade value outcomes for courses by semester.

  • Course Attributes Data: Gather data on course attributes such as class size, instructional format (online, hybrid, in-person), faculty information, time and day of the class offering, and any other relevant features that might influence grade outcomes.

Exploratory Data Analysis (EDA)

  • Descriptive Statistics: Calculate summary statistics for overall grade distributions, college-wise, department-wise, and course-level. Examine mean, median,sums, and any potential outliers.

Pattern Analysis

  • Clustering: Utilize clustering techniques to group colleges, departments, and courses based on similar grade outcomes. Evaluate if specific clusters show distinct patterns during the pandemic.

  • Classification: Apply classification algorithms to identify factors influencing grade outcomes. Investigate if colleges, departments, or specific course attributes significantly contribute to predicting grade results.

  • Anomaly Detection: Identify anomalies in grade distributions, focusing on courses or departments that experienced unexpected changes during the pandemic. Conduct a fine-grained inspection of these anomalies.

  • Time Series Decomposition: Decompose the time series data to identify trends, seasonality, and residual components. Analyze if any irregularities in the residual component coincide with the pandemic period.

Predictive Modeling

  • Feature Selection: Select relevant course attributes for predictive modeling. Use domain knowledge and statistical methods to choose features that significantly impact grade outcomes.

  • Model Development: Build predictive models using machine learning algorithms such as regression or classification. Split the data into training and testing sets to evaluate model performance.

    • Decision Tree Analysis: Employ decision tree analysis to understand the hierarchy of factors influencing grade outcomes.

    • Regression: Utilize regression for predictive model

  • Model Evaluation: Assess the accuracy, precision, recall, and F1-score of the developed models. Compare models’ performance across different colleges, departments, and course levels.

  • Interpretation: Interpret the results of the predictive models to understand the key factors influencing grade outcomes. Provide insights into how colleges, departments, and course attributes contribute to student success.

Reporting

  • Prepare a comprehensive report summarizing the methodology, findings, recommendations and potential future research. This report will include visualizations such as charts and graphs to enhance the presentation of results. Clearly communicate the findings of the study in an effort to student support services and academic planning at the University of Arizona.

Weekly Work Plan and Responsibilities

Week 1 Nov 12th - Nov 18th

  • Finalize proposal - group
  • Clean data - Group
  • Exploratory Data Analysis
    • Grade Value Outcomes - Kristi, Rohit
    • Course Attributes - Dong, Anjani, Utkarsha
  • Clustering and Classification - Group

Week 2 Nov 19th - Nov 25th

  • Anomaly Detection - Rohit, Kristi
  • Time Series Analysis - Anjani, Utkarsha, Dong

Week 3 Nov 26th - Dec 2nd

  • Decision Tree Analysis - Group
  • Regression Predictive Model - Group 
  • Rough Draft of Write-Up - Dong and Kristi
  • Rough Draft of Abstract - Dong and Kristi
  • Rough Draft of Presentation - Rohit, Anjani, Utkarsha

Week 4 Dec 3rd - Dec 11th

  • Finalize Write-Up - Group
  • Finalize Abstract - Group
  • Finalize Presentation - Group
  • Deliver Final Project Materials and Presentation - Group

Repo Organization

The following are the folders involved in the Project repository.

data/

Used for storing any necessary data files for the project, such as input files.

images/

Used for storing image files used in the project.

extra/

Used to brainstorm our analysis which won’t impact our project workflow.

freeze/

This folder is used to store the generated files during the build process. These files represent the frozen state of the website at a specific point in time.

github/

Folder for storing github templates and workflow.

.git

hidden directory at the root of the repository that contains the internal data structure and configuration files

about.qmd

contains information about Project title and team members.

presentation.qmd

contains the presentation for final project

proposal.qmd

contains the proposal for the final project

index.qmd

Contains the abstract for the project.

presentation.qmd

includes information about the presentations of the project

EDA_Grade_Outcomes

Contains the Exploratory Data Analysis code chunks and plots

Time_Series_Analaysis.qmd

Contains the Time series analysis for the project

Decision_Tree.qmd

Contains Decision Tree Analysis execution

Regression.qmd

Contains regression analysis and regularization using lasso and ridge models