Analyze Student Default Rates: Uncovering Causes and Consequences w/Regression Analysis

REGRESSION ANALYSIS OF COHORT DEFAULT RATES

BY: BAILEY HUNT

WARTBURG COLLEGE

MA 461

Introduction

Student default rates have been an increasing problem over the past several decades. They saw a high of 22.4% in 1990 and had been trending down to a low of 4.5% in 2003. Since, they have been trending up to the current level of 11.5%. It is no question that debt for students is a major problem with the total student loan debt over $1.5 trillion. This is frightening for prospective college students because they invest enormous amounts of money with no actual guarantee of a job after graduation. Is this what is causing the student default rates to increase so much? Although default rates seem very simple on the surface, when digging deeper into the causes and consequences, they become very complex and difficult to understand.

Student default rate increase is a problem because it helps gauge several different variables and shows these variables are struggling. First, it helps the government gauge the type of student that are being accepted into colleges and an increase in default rates shows colleges are taking advantage of governmental loans to enroll more students. With that being said, it is hard for colleges to gauge what students will have loan problems in the future. Next, an increase in student default rates show that students are unable to pay their loans back within the contract time. This can be for a variety of different reasons but primarily means those students are struggling financially. So, why are student default rates rising and what does this mean for the future of secondary education. What are the variables that cause increase in student default rates? These are important questions because it helps us find the source of the problem and potentially solve this problem.

Throughout this paper, I will be doing an analysis on the correlation between seventeen variables and the student default rates. This research will be done among 50 different colleges. The variables and colleges along with reasoning’s behind them will be discussed father along in the paper. The question that I am trying to figure out is which variables have correlation with the student default rates amongst those colleges? In addition, I will be creating a linear regression model to predict student default rates at a given college using the variables that show positive correlation. Ultimately, this will allow us to predict default rates given those variables and see why students default on their loans. This will be completed by collecting data for each variable and running correlation testing using statistical software. We are limited to 50 colleges/universities throughout the United States for lack of data and lack of resources. Therefor, we must make the assumption that these colleges/universities substantially represent the population of colleges within the country. In the next section, I will give further background into student default rates.

Table of Contents

Literature review

To fully understand this analysis, it is important to know all of the details of defaulting such as what is means, why it happens, and what are the consequences. Throughout this section, I will give background of defaulting and then go into the history and details of cohort default rates.

Delinquency is an important term to understand when exploring what it means to default. Delinquency means that you are late on a payment on your loan2 . This can cause many negative effects. If you are delinquent for more than 90 days, then your loan provider will report this to all three credit bureaus and this leads to a lowering of your credit score2. Having a poor credit score can lead to a number of problems in the future. When defaulting comes in depends on the type of loan. In basic loans, the individual defaults if he or she has not paid off the loan by the time the loan is to be paid in full. This often times happens because people financially are not able to pay for these loans. In the case of students, they are taking on more debt because of the drastic increase of higher education, making it harder to pay back loans. Getting in too much debt is a huge problem in our society and it leads to defaulting and sometimes bankruptcy.

Defaulting on a loan can have major consequences. Depending on the case, there are multiple scenarios that can happen. The entire balance of the loan and the interest becomes due immediately. The individual loses several benefits such as the ability to chose a repayment plan. An obvious but big consequences is the loss of additional federal student aid. Much like with delinquency, the default is reported to three credit bureaus which severely damages credit score. This has several negative effects. Having a poor credit score makes it difficult to get loans in the future and in order for an individual to rebuild their credit score, they must be disciplined for several years in terms of their borrowing. Unlike going into bankruptcy, the individual defaulting on their loan is still required to pay for their loan so the government can take several actions to ensure that this happens. The first is the can withhold all tax refunds and federal benefits and apply those to the defaulted loan. In addition to that, the employer of the individual can withhold part of their wages and apply that to the defaulted loan. Another negative effect is that the loan holder can take the individual to court which can cost a lot in court fees2. Some of these consequences can effect an individual for a long time. With that being said, defaulting on a loan is not something that should be taken lightly. Next, we will explore the background behind cohort default rates.

For the majority of higher education history, the primary means of financing was from grants. In the United States, there began a shit from grants to loans in the mid 1970s. This was spurred on by fiscal policies that broadened eligibility for subsidized loans, increased loan limits, and opened the unsubsidized loan program to all students. However, this caused concern among many people that the student default rate on loans would increase. Since broadening the criteria for receiving a loan, many colleges begin recruiting low income individuals to come to their school paid for by loans, knowing that these students may not be able to pay these loans back. This cause default rates to increase drastically. To combat this, the government implemented cohort default rates. This meant that each year, the government would evaluate the student default rate for each cohort for every school. If schools had cohort default rates over 30% for three years in a row, or if they had cohort default rates over 40% one year, they would lose access to federal grants and loans. The criteria of who defaulted on a loan was that if an individual failed to make a payment within 180 days then it was considered defaulted. For each cohort, they would be evaluated at three years after graduation.

Cohort default rates did a great job at first as illegitimate schools just looking for money were forced to close due to lack of income. However, many colleges worried that the strict rules of cohort default rates would effect legitimate schools. Due to this concern, the government changed the defaulting definition from 180 days of no payments to 270 days or nine months with no payments. These cohort default rates are public record and can be found on the Federal Student Aid office website. Now that we have explored more about default rates and their meaning, we can begin to explore how we will be finding what variables show correlation to these default rates.

Methodology

As said before, the goal of this analysis is to find correlation between variables and cohort default rates so that we can ultimately find the source of the default rate problem and perhaps present solutions. To do this, we must find some way to find what variables to use and see if they show correlation. After this, finding the data for this should be completed to run the testing. The testing itself will be done by regression testing. Throughout the methodology section, I will further explain the steps taken in completing this research.

To find variables for the analysis, I took into consideration several different factors. The first and most obvious was the relevance of the variable to the topic and whether or not in theory the variable would have correlation. After brainstorming several different variables and starting to data mine, I realized that another factor in the variables chosen was ease and accessibility of the data. For example, one of the variables I was very excited to see whether it had correlation was family wealth/income of the student. However, it was very difficult for me to find data on this topic so unfortunately I had to abandon my research on that variable. On the topic of ease, the college navigator government website gives several data points for every college. Data ranges from enrollment to demographics. Based off of the information that was given on this website, I made a list of the 19 most relevant variables that would show possible correlation. The list goes as follows:

In state tuition, out of state tuition, acceptance percentage, average ACT score, first year retention rate, four year gradation rate, percentage of male students, percentage of students that are white, percentage of students over 25 years old, percentage of students from in state, percentage of full time students, percentage of students receiving federal loans, percentage of students receiving financial aid, total enrollment, average amount of loan per student, average amount of financial aid per student, average income after graduation.

After deciding on the variables to use, I had to decide what universities to look at for the analysis. I originally was very ambitious in wanting to run this analysis on all colleges in the United States. I soon realized that time would be a major issue for this. With that being said, I decided to narrow my research down to less colleges. I decided to use the 50 flagship universities. These are the main public state universities from each state. There are several reasoning’s behind my choice. The first is that they offer good geographical diversity so that not all universities are from a certain location. The next is that these universities offer a great amount of differences in their data for all of the variables that I had selected. Whether that be the actual default rates, or it be any on of the variables that I selected, they have good spread throughout. The last reason why I chose the flagship universities is because they are schools similar in structure, but like I said before they are different in their statistics and locations. After deciding on the variables and schools, it was just a matter of data mining. After this I would do my testing.

When beginning my research, I had the idea that I just wanted to do a correlation analysis so that I would be able to know what variables were most closely related. Throughout my research I decided to take that one step further and create a linear regression model to estimate default rates for schools. The reason I thought it would be useful to do this is that I could better understand what variables effect student default rates. Additionally, creating a linear regression model allows for several advantages. It allows us to be able to see at what level each variable is correlated and what type of correlation it is. It also allows the for the ability to predict future default rates based off of the variables that are correlated. This is helpful because it can tell what schools are in danger of getting their federal funding stopped due to a large default rate. The testing itself was not very complicated.

For the testing, I used Minitab 17 statistical software. I took all of the data from the excel sheet that I created and put it into the Minitab system. Minitab looks much like excel so it formatted nicely. Next I began my testing. As I said before, I used regression testing. The regression testing on Minitab gave me several statistics. The statistics that I evaluated first was it assigned p-values to every variable. What I did in order to get my model was I took away the variable with the highest p-value that was over .05 because it showed very little correlation. I then re-ran the regression testing and repeated the process. I did this until all variable left had a p-value under .05. This was to ensure that the variables that would be in the linear regression model had strong correlation.

The next part of the testing that it gave me was the coefficients for each variable and the linear regression model. After getting the final linear regression model with the most correlated variables I had to test the validity of the model by plotting the residuals. This is done to show that there is randomness in the residuals and that there is no trend in the residuals. To get the residuals, I plugged the data from each university in my research into the regression model I had. I then took that estimated default rate that I received and subtracted that from the actual default rate from each university. The formula for this can be shown as, . After calculating all of my residuals, I made a scatter plot of the correlated variable vs the residuals for each university in my test. Like I said before I did this to make sure that there is complete randomness in the residuals.

Now that I have explained how I did my testing and shown validity in my model, I will explain my results.

Results

Correlation

As I said in my methodology section, I started my testing by doing my testing on all 19 variables. The following table shows the P-value and coefficient for the test that I ran.

Term

Coefficient

P-value

Constant

8.9

0.564

Percent of students over 25

22.1

0.036

Enrollment

0.000086

0.059

First year retention rate

-20.2

0.098

In state tuition

-0.000066

0.671

Out of state tuition

-0.000008

0.93

Acceptance percentage

-0.79

0.784

Average ACT

-0.298

0.409

4 Year graduation rate

-1.7

0.682

Percent of students male

-7.06

0.418

Percent of students white

2.02

0.508

Percent of In state students

1.31

0.598

Percent full time students

9.42

0.266

Percent of students receiving federal loans

2.22

0.638

Percent of students receiving financial aid

3.32

0.234

Average amount of loan

0.000491

0.611

Average amount of financial aid

0.000134

0.427

Table i

As I explained in the methodology section, I then took out the variable with the highest P-value and re-ran the test. In this case, the first variable that I took out was acceptance percentage. After I did this several times, I got down to three variables that had P-values less than .05. Something to also note is that when doing regression testing, taking away one variable from the test will change the P-value for all of the other variables so it was important to take away variables one at a time to see the effects. With that being said, the following table shows the three variables along with the p-value and the coefficient for the linear regression model.

Term

Coefficient

P-Value

First year retention rate

-22.64

Percent of students over 25

14.23

Enrollment

0.000059

0.024

Table ii

As you can see from the table, first year retention rate and percent of students over 25 years old shows strong correlation with p-values of 0. Also, enrollment has p-value of .024. Other variables, such as percent of students receiving aid, showed good p-values but they were not under the .05 threshold.

Regression Model

Based off of the three variables that showed sufficient correlation and a constant, the regression testing provided a linear regression model that will estimate default rates. Using the coefficients in table ii, the model goes as follows:

where,

Estimated Default Rate

Percent of students over 25

Total enrollment

First year retention rate

After getting the model finalized, I tested the validity of it with the residuals. The following graphs represent the scatter plots of the residuals compared to each variable in the model.

As you can see from the scatter plots, the residuals show complete randomness so that shows that a linear regression model is the most appropriate. Now that I have shown the results of the testing and shown validity of it, I will go into a deeper analysis of the meaning and make conclusions based off of the data.

Conclusions/Analysis

From the results we can conclude more about why certain schools have higher default rates and what causes that to happen. The first and most obvious is what variables effect the default rate. As talked about in the results section, the three variables that effect student default rates are: first year retention rate, percent of students over 25, and enrollment. In addition to this, the linear regression model allows the ability to know what effect each variable has. In the model, since percent of students over 25 and total enrollment have positive coefficients, that means that as those two variable increase, the default rate increases. Since the coefficient for first year retention rate is negative, the higher this variable is, the lower the default rate is. Overall, this gives us the ability to know what variables are effecting student default rates and how they are effecting student default rates. This knowledge can help schools allocate more resources to those variables and making sure that they are in an appropriate range so that the default rates are not a problem. Default rates have been an enormous problem over the past several decades and through research and reform we can stop this problem.

Works cited

Gross, Jacob P.K., et al. “What Matters in Student Loan Default: A Review of the Research Literature.” Federal Student Aid, National Association of Student Financial Aid Administrators, 2009, files.eric.ed.gov/fulltext/EJ905712.pdf.

National Center for Education Statistics (NCES) Home Page, a Part of the U.S. Department of Education, National Center for Education Statistics, nces.ed.gov/collegenavigator/?s=all&l=93&fv=157085%2B159391%2B166629%2B163286.

“National Student Loan Two-Year Default Rates.” Home, US Department of Education (ED), www2.ed.gov/offices/OSFAP/defaultmanagement/defaultrates.html.

“Official Cohort Default Rates for Schools.” Federal Student Aid, US Department of Education (ED), www2.ed.gov/offices/OSFAP/defaultmanagement/cdr.html.

“Understanding Delinquency and Default.” Federal Student Aid, Federal Student Aid Office, 19 Oct. 2018, studentaid.ed.gov/sa/repay-loans/default#default.

“Why You Need to Check Your Residual Plots for Regression Analysis: Or, To Err Is Human, To Err Randomly Is Statistically Divine.” Minitab Blog, Minitab, 5 Apr. 2012, blog.minitab.com/blog/adventures-in-statistics-2/why-you-need-to-check-your-residual-plots-for-regression-analysis.

Essay: Analyze Student Default Rates: Uncovering Causes and Consequences w/Regression Analysis

Essay details and download:

Text preview of this essay:

Literature review

Methodology

About this essay:

Essay details and download:

Text preview of this essay:

Literature review

Methodology

About this essay:

Essay Categories: