# Load Librarieslibrary(knitr)if (!require(pacman))install.packages(pacman)pacman::p_load(tidyverse, # Data wrangling dlookr, # Exploratory Data Analysis formattable, # Present neat table format gt, # Alternating formatting for the tables gtsummary) # Summary for the tables
Introduction
Low student success rates have a profound impact on an educational institution’s academic offerings, their ability to provide financial support and scholarships, as well as their overall professional reputation. Much is unknown about how the corona-virus pandemic has impacted student success in higher education. Thus, in order to better understand this impact, this study seeks to conduct a detailed examination of the grade value outcomes during this time. By examining the grade value outcomes for the years before, during and after the pandemic, patterns and trends may be discovered that might help guide student support services in the future. The analysis will breakdown the grade values outcomes by college, department, division level and course level using clustering and classification techniques. Anomalies will be isolated for a more fine grained inspection that includes course attributes. These course attributes will be used in an effort to create a model for predictive grade value outcomes.
Goals
The high-level goal of this study is to analyze and understand the grade outcomes at the University of Arizona (UofA) pre-, during, and after the COVID-19 pandemic. This investigation revolves around two primary questions:
Pattern Analysis – What if any patterns or relationships exist between colleges, departments, or course level and grade outcomes for enrolled students?
Predictive Modeling – Can a predictive model be developed that accurately forecasts grade outcomes based on course attributes?
Dataset
The first data set used for this analysis, is known as the DEW Rates, contains information about the breakdown of grade value outcomes during the academic years of 2018-19, 2019-20, and 2020-21 at the University of Arizona. The consists of 35697 observations of courses with 23 variables related to enrollment and grade values. Below is a description of the data:
Grade Outcomes Data
Variable
Type
Description
College
character
Identifies the college at the University of Arizona
Department
character
Identifies which department within given college
Subject.Code
character
Identifies the subject of the course with a 3 or 4 letter code
Catalog.Number
character
Identifies the catalog number of the course
Course.Description
character
Identifies the title or the course
Course.Level
character
Identifies if the course is lower, upper or graduate division
Total.Student.Count
Integer
The total enrollment for the course
D_GRADE_COUNT
Integer
The number of enrolled students that earned a D grade in the course
FAIL_GRADE_COUNT
Integer
The number of enrolled students that earned a E(Fail) grade in the course
WITHDRAW_GRADE_COUNT
Integer
The number of enrolled students that withdrew from the course
DEW_COUNT
Integer
The total of D, E, and W grades earned by enrolled students in the course
PASS_GRADE_COUNT
Integer
The number of enrolled students that received a C or greater grade value
WITHDRAW_FULLMED_GRADE_COUNT
Integer
The number of enrolled students that took a full medical withdrawal
INCOMPLETE_UNGRADED_COUNT
Integer
The number of enrolled students that did not receive grades
TERM_LD
character
The semester that the course was offered
ACAD_YR_SID
Integer
The academic year that the course was offered
Percent.D.Grade
numeric
The percentage of enrolled students that received a D grade in the course
Percent.E.Grade
numeric
The percentage of enrolled students that received a E grade in the course
Percent.W.Grade
numeric
The percentage of enrolled students that received a W grade in the course
Percent.DEW
numeric
The percentage of enrolled students that received a D, E, or W grade in the courseThe percentage of enrolled students that received a D, E, or W grade in the course
Percent.Passed
numeric
The percentage of enrolled students that received a C or greater grade value in the course
Per.Full..Medical.Withdrawal
numeric
The percentage of enrolled students that had a full medical withdraw in the course
Per.Ungraded..Incomplete
numeric
The percentage of enrolled students that did not receive a grade value in the course
Load Data 1
# read in datadew_counts <-read.csv("data/DEW Rates.csv")kable(head(dew_counts))
College
Department
Subject.Code
Catalog.Number
Course.Description
Course.Level
Total.Student.Count
D_GRADE_COUNT
FAIL_GRADE_COUNT
WITHDRAW_GRADE_COUNT
DEW_COUNT
PASS_GRADE_COUNT
WITHDRAW_FULLMED_GRADE_COUNT
INCOMPLETE_UNGRADED_COUNT
TERM_LD
ACAD_YR_SID
Percent.D.Grade
Percent.E.Grade
Percent.W.Grade
Percent.DEW
Percent.Passed
Per.Full..Medical.Withdrawal
Per.Ungraded..Incomplete
College of Engineering
Aerospace & Mechanical Engr
ABE
489A
Fab Tech Micro+Nanodevic
Upper Division
10
0
0
0
0
10
0
0
Fall 2018
2019
0.0
0.0
0.0
0.0
100.0
0.0
0
College of Engineering
Aerospace & Mechanical Engr
AME
105
Introduction to MATLAB I
Lower Division
196
17
2
1
20
173
3
0
Fall 2018
2019
8.7
1.0
0.5
10.2
88.3
1.5
0
College of Engineering
Aerospace & Mechanical Engr
AME
105
Introduction to MATLAB I
Lower Division
160
22
17
3
42
115
3
0
Fall 2019
2020
13.8
10.6
1.9
26.3
71.9
1.9
0
College of Engineering
Aerospace & Mechanical Engr
AME
105
Introduction to MATLAB I
Lower Division
111
11
4
5
20
82
9
0
Fall 2020
2021
9.9
3.6
4.5
18.0
73.9
8.1
0
College of Engineering
Aerospace & Mechanical Engr
AME
105
Introduction to MATLAB I
Lower Division
254
11
5
6
22
231
1
0
Spring 2019
2019
4.3
2.0
2.4
8.7
90.9
0.4
0
College of Engineering
Aerospace & Mechanical Engr
AME
105
Introduction to MATLAB I
Lower Division
213
3
9
19
31
182
0
0
Spring 2020
2020
1.4
4.2
8.9
14.6
85.4
0.0
0
The second data set used for this analysis, is known as the SY20 and SY21, contains information about course attributes for the academic years of 2019-20, and 2020-21 at the University of Arizona. Unfortunately the academic year of 2018-19 was unable to obtained for this study. There are over 260,000 observations of courses with 27 variable per observation.
Course Schedule Overview
Variable
Type
Description
Term
Character
The term the course was offered
Campus
Character
The campus the course was offered at
Session
Character
The type of session for the course - regular, first 7 weeks, etc,
Subject
Character
The 3 or 4 letter subject code of the course
Cat . .
Character
The course catalog code
Section
Integer
The section number of the course
Class . .
Integer
The class identifier code
Start.Date
Date
The start date of the course
End.Date
Date
The end of the course
Meet . .
Integer
The number of meetings for each course
Req.Desig
Character
A course type identifier
P.F. Opt
Character
A flag that indicates if the course was a P or F option
Component
Character
Category/Type of course
Units
Integer
The number of units of the course
Min.Units
Integer
The minimum number of units allowed for the course
Max.Units
Integer
The maximum number of units allowed for the course
Course
Character
The name of the course
Combined.Section
Character
Identifier if the course is combined with another course
Meeting.Days
Character
The list of days the courses meets on
Start
Character
The start time of day that the course meets
End
Character
The end time of day that the course meets
Facility
Character
The location or room of the course meetings
Total.Enroll
Integer
The total number of students enrolled
Max.Enroll
Integer
The maximum number of students allowed to enrolled
Rm.Cap
Integer
The maximum number of students allowed in the meeting space
Enrollment.Status
Integer
Identifier if the course was an open or closed enrollment
Mode
Character
The modelity of the course
Load Data 2
# read in two csv filessy20_data <-read.csv("data/Course Schedule Overview - SY20.csv")kable(head(sy20_data))
Term
Campus
Session
Subject
Cat..
Section
Class..
Start.Date
End.Date
Meet..
Req.Desig
P.F.Opt
Component
Units
Min.Units
Max.Units
Course
Combined.Section
Meeting.Days
Start
End
Facility
Total.Enroll
Max.Enroll
Rm.Cap
Enrollment.Status
Mode
Fall 2019
MAIN
Regular Academic Session
ABS
593A
1
45315
8/26/2019
12/11/2019
1
-
Ind Study
1
1
9
Internship in Applied Biosci
-
12:00 AM
12:00 AM
NA
0
0
0
Closed
In Person
Fall 2019
MAIN
Regular Academic Session
ABS
593A
2
45394
8/26/2019
12/11/2019
1
-
Ind Study
1
1
9
Internship in Applied Biosci
-
12:00 AM
12:00 AM
NA
1
5
0
Open
In Person
Fall 2019
MAIN
Regular Academic Session
ABS
593A
3
45395
8/26/2019
12/11/2019
1
-
Ind Study
1
1
9
Internship in Applied Biosci
-
12:00 AM
12:00 AM
NA
0
5
0
Open
In Person
Fall 2019
MAIN
Regular Academic Session
ABS
593A
4
45397
8/26/2019
12/11/2019
1
-
Ind Study
1
1
9
Internship in Applied Biosci
-
12:00 AM
12:00 AM
NA
1
5
0
Open
In Person
Fall 2019
MAIN
Regular Academic Session
ABS
593A
5
45398
8/26/2019
12/11/2019
1
-
Ind Study
1
1
9
Internship in Applied Biosci
-
12:00 AM
12:00 AM
NA
0
5
0
Open
In Person
Fall 2019
MAIN
Regular Academic Session
ABS
593A
6
47657
8/26/2019
12/11/2019
1
-
Ind Study
1
1
9
Internship in Applied Biosci
-
12:00 AM
12:00 AM
NA
0
5
0
Open
In Person
Exploratory Question
What patterns and variations exist in grade value outcomes across colleges, departments, and course levels during and after the COVID-19 pandemic?
Predictive Question
How can course attributes associated with poor grade outcomes during this time pandemic predict future grade values for similar courses?
Analysis plan
Data Collection
Historical Grade Data: Collect historical grade data from the University of Arizona for the years before, during, and after the COVID-19 pandemic. This data should include information on grade value outcomes for courses by semester.
Course Attributes Data: Gather data on course attributes such as class size, instructional format (online, hybrid, in-person), faculty information, time and day of the class offering, and any other relevant features that might influence grade outcomes.
Exploratory Data Analysis (EDA)
Descriptive Statistics: Calculate summary statistics for overall grade distributions, college-wise, department-wise, and course-level. Examine mean, median,sums, and any potential outliers.
Pattern Analysis
Clustering: Utilize clustering techniques to group colleges, departments, and courses based on similar grade outcomes. Evaluate if specific clusters show distinct patterns during the pandemic.
Classification: Apply classification algorithms to identify factors influencing grade outcomes. Investigate if colleges, departments, or specific course attributes significantly contribute to predicting grade results.
Anomaly Detection: Identify anomalies in grade distributions, focusing on courses or departments that experienced unexpected changes during the pandemic. Conduct a fine-grained inspection of these anomalies.
Time Series Decomposition: Decompose the time series data to identify trends, seasonality, and residual components. Analyze if any irregularities in the residual component coincide with the pandemic period.
Predictive Modeling
Feature Selection: Select relevant course attributes for predictive modeling. Use domain knowledge and statistical methods to choose features that significantly impact grade outcomes.
Model Development: Build predictive models using machine learning algorithms such as regression or classification. Split the data into training and testing sets to evaluate model performance.
Decision Tree Analysis: Employ decision tree analysis to understand the hierarchy of factors influencing grade outcomes.
Regression: Utilize regression for predictive model
Model Evaluation: Assess the accuracy, precision, recall, and F1-score of the developed models. Compare models’ performance across different colleges, departments, and course levels.
Interpretation: Interpret the results of the predictive models to understand the key factors influencing grade outcomes. Provide insights into how colleges, departments, and course attributes contribute to student success.
Reporting
Prepare a comprehensive report summarizing the methodology, findings, recommendations and potential future research. This report will include visualizations such as charts and graphs to enhance the presentation of results. Clearly communicate the findings of the study in an effort to student support services and academic planning at the University of Arizona.
Weekly Work Plan and Responsibilities
Week 1 Nov 12th - Nov 18th
Finalize proposal - group
Clean data - Group
Exploratory Data Analysis
Grade Value Outcomes - Kristi, Rohit
Course Attributes - Dong, Anjani, Utkarsha
Clustering and Classification - Group
Week 2 Nov 19th - Nov 25th
Anomaly Detection - Rohit, Kristi
Time Series Analysis - Anjani, Utkarsha, Dong
Week 3 Nov 26th - Dec 2nd
Decision Tree Analysis - Group
Regression Predictive Model - Group
Rough Draft of Write-Up - Dong and Kristi
Rough Draft of Abstract - Dong and Kristi
Rough Draft of Presentation - Rohit, Anjani, Utkarsha
Week 4 Dec 3rd - Dec 11th
Finalize Write-Up - Group
Finalize Abstract - Group
Finalize Presentation - Group
Deliver Final Project Materials and Presentation - Group
Repo Organization
The following are the folders involved in the Project repository.
data/
Used for storing any necessary data files for the project, such as input files.
images/
Used for storing image files used in the project.
extra/
Used to brainstorm our analysis which won’t impact our project workflow.
freeze/
This folder is used to store the generated files during the build process. These files represent the frozen state of the website at a specific point in time.
github/
Folder for storing github templates and workflow.
.git
hidden directory at the root of the repository that contains the internal data structure and configuration files
about.qmd
contains information about Project title and team members.
presentation.qmd
contains the presentation for final project
proposal.qmd
contains the proposal for the final project
index.qmd
Contains the abstract for the project.
presentation.qmd
includes information about the presentations of the project
EDA_Grade_Outcomes
Contains the Exploratory Data Analysis code chunks and plots
Time_Series_Analaysis.qmd
Contains the Time series analysis for the project
Decision_Tree.qmd
Contains Decision Tree Analysis execution
Regression.qmd
Contains regression analysis and regularization using lasso and ridge models