Does firm size affect the UK wage structure for male employees?
Introduction
In this project I aim to create an econometric model which models the wage structure for UK male employees. The investigation is the impact firm size has on the wage structure, measured as the hourly wage paid to a male, aged 18-65 in the UK. By using explanatory variables such as education level, age, age2 and size of firm I will be able to answer the question using stats tests.
The wage structure reflects wage differentials that arise due to reasons such as gender, race, region or firm size. It reflects the theory that productivity differences exists between workers (Borjas,2010). The correlation between firm size and wage is said to be positive and is attributed to various reasons, some of which are: 1) larger firms hire higher quality workers who have invested more into human capital and 2) larger firms make use of higher wages to prevent strikes by union staff (Brown & Medoff, 1989).
2. A description of the econometric model
The model is a quadratic semi-log model with the dependent variable being the log of hourly wage. Using the log ensures that the large sample of cross-sectional data is represented in a smaller scale; it also reduces the impact of heterogeneity.
lnhourlyW=β_0+ β_1 age+ β_2 age2+β_3 manager +β_4 gcse+β_5 alevel+β_6 degree+ β_7 hourly+β_8 tenure+β_9 small+u_i
Age2 is included to ensure concavity of the wage equation. As age increases, the hourly wage follows however the rate at which wages increase at decreases. The Mincer Equation suggests a link between education level and wage (Patrinos, 2016), therefore I’ve included the variables gcse, alevel and degree. The dummy variable small has been included so I can test both the individual significance of the size of the firm as well as the structural equivalence of firm size. The variable married isn’t included as there is no observed link between marriage and firm size. However marriage does have an impact on human capital and subsequently wages, with married men earning 15% more than un-married counterparts (Hewitt, Western & Baxter, 2002). The variable none is excluded as to avoid perfect multi-collinearity experienced in the dummy variable trap. I have included degree as theoretically graduates are hired by larger firms and are more likely to work for larger firms who can pay wages that justify the large investment they made into human capital by undergoing a degree.
3. Issues with Data, limitations and concerns
One limitation in this model is omitting variables to reduce the degree of multi-collinearity. Excluding key variables is model misspecification and violates assumptions of the CLRM2. Another is that the survey considers a firm small if they have ≤ 25 employees. The UK Companies Act 2006 considers a business to be small if they have ≤ 50 employees and £6.5m revenue, it may have been better to use the same measure as the governments, because companies that are legally considered small benefit from tax relief, which could impact their ability to pay higher wages to their staff.
There are several issues that could arise with the data used:
The data set only considers males between 18 – 65. This leads to selection bias as there are 176,771 16-17 year olds are working (ONS.gov.uk, 2018), therefore this SRF isn’t representative of our PRF.
The QLFS only surveys 3,868 males in the UK , which is under-representative of the PRF.
Not everyone is paid an hourly wage, and the survey requires those who are paid a pro-rata rate to calculate hourly salary, these rounding errors could lead to observational measurement errors
There is no consideration of the industry that UK male employees are employed in – service sector may have greater wage disparity than the manufacturing industry.
Not all factors that impact the hourly wage were accounted for in the model. However variables were prioritised based upon underlying economic theory.
4. A statement of the hypothesis to be tested
To answer the question properly I will conduct individual significance tests for each explanatory variable in the regression model. As well as this I will also carry out a joint significance test to ensure all variables are jointly significant to the lnhourlyW. Finally, to test for structural equivalence I will do a chow test on the small variable.
5. Model, specification tests and analysis
5.1 Ordinary least squares (OLS) estimation of the econometric model:
lnhourW=1.1829+0.0508age-0.0005age2+0.2131manager +0.1491gcse+0.3009alevel+0.5360degree+ 0.0060hourly-0.1423tenure-0.2458small+u_i
5.2 Model misspecification test for omitted variables: Ramsey RESET test
To test whether the model has been under fit a RESET test is carried out. To start, I estimate the restricted model to obtain a value for Yi. The un-restricted model is then run, ncluding the fitted Yi2 and Yi3 as explanatory variables. R2 from the restricted model is denoted as R2OLD and R2NEW for the un-restricted model. An F-test is then carried out to test the significance of the increase in R2.
F(2,3856) = 18.225 F_2,3856^0.05=2.998 〖 F〗_2,3856^0.01=4.611
At both significance levels, the F-statistic > critical value; the null hypothesis is rejected and there is misspecification. Whilst the y RESET test shows that the model isn’t correctly specified it doesn’t give an explanation as to why. The consequences of omitting variables is that that disturbances (σ2) are incorrectly estimated. This leads to incorrect confidence intervals and hypothesis tests giving misleading calculations.
5.2 Test for heteroscedasticity: White’s general test
One of the CLRM assumptions is that the error term has constant variance, in order to test this I ran White’s general test, this suits large samples of cross – sectional data the best. Using PCGIVE I ran an auxiliary regression with squares and cross products. The null hypothesis was that there is no heteroscedasticity present.
x_(14 )^2=215.16
The chi-squared statistic for the auxiliary regression using squares and cross products was greater than the critical values at 1% and 5% significance, therefore I can reject my null hypothesis; heteroscedasticity is present in the model. Given the large sample of cross sectional data, an element of heteroscedasticity was expected due to the scale effect (Gujarati & Porter, 2009).
6. Results
6.1 Overall significance
To test for overall significance I ran an F-test with the following distribution:
F=(R^2÷(k-1))/((1-R^2)÷(n-k)) ~ F_(k-1)(n-k) = 93.901
The F value is substantially larger than the critical values at 1% and 5% significance levels therefore the null is rejected; the co-efficients do not all equal zero.
6.2 Individual significance
To test for individual significance I used a t-test with the following distribution:
T Test=(b-b ̂)/hcse(b) = 1.645
I used PC-Give to calculate the HCSE values for the co-efficient. The critical values for the t-statistic is 1.645 at 5% and 2.327 at 1%. All t-values were larger than the critical values, and therefore all null hypothesis can be rejected at all significance levels.
6.3 Analysis of co-efficients
The variable age and age2 have the co-efficient 0.0508 and -0.00051 respectively – this is expected as the change in sign represents the diminishing returns between wage and age; as age increases, wage increases at a decreasing rate.
Using the formula (log-1(β) – 1) x 100 we can measure the % increase or decrease on wages for an absolute increase in the explanatory variable. Those who left education at GCSE have a 16% higher wage than those who don’t, those who left at A-level see a 35.1% increase and those who left education at university level experience a 70.9% increase in wages compare to those who left education with no qualifications. This follows on from the work of Mincer, who suggests that higher education levels, which follow higher levels of human capital, experience a higher level of wages. The higher the investment into human capital, the higher the increase to salary is.
Those who are paid on an hourly basis earn 22% less per hour compare to those who are paid a pro-rata rate. This follows theory that smaller firms pay on an hourly basis whereas larger firms pay on a pro-rata basis.
Those who work at smaller firms earn 13.4% less per hour than those who work at larger firms. This follows on from theory introduced in the introduction, which stated that larger firms pay more than smaller firms. Whilst there is a difference in the wage paid, the significance of 13.4% is yet to be tested.
6.4 Structural equivalence of workers at small firms
To test for structural equivalence of those who work at small firms, I carried out a chow test. I did this by running the original model and obtaining the RSS. I then split the sample into two groups: the first where small = 0 between 1 – 2798 and the second where small = 1 between 2799-3868. I ran both smaller models and obtained the RSS from either.
F test: ([RSS_R-(RSS_1+RSS_2 )]/k)/((RSS_1+RSS_2 )/(N_1+N_2-2k)) = 1.01
F(9,3850) = 1.01 F_9,3850^0.05=1.88 〖 F〗_9,385^0.01=2.41
My F value of 1.01 was smaller than the critical value at both 5% and 1% therefore I accept the null hypothesis that there is structural equivalence. The chow test result signifies that there is no difference in earnings between those at small firms and those at large firms.
Whilst the chow test measures for structural equivalence, it assume homoscedasticity. In section 5.2 we already established that this model has heteroscedasticity, therefore the result of the chow test may not be reliable.
7. Discussion on two issues within results and model
7.1 Measurement error
The hourlyW is the rate at which employees are paid per hour – however the sample includes males who are paid on a pro-rata basis. Those individuals had to calculate their own hourly wage by dividing pro rate by average hours they think they work; human error mistakes with rounding or calculating their wage could have an impact on the data used.
Measurement errors observed in the dependent variables don’t lead to the model breaking any assumptions of the CLRM, therefore the OLS estimators are still un-biased and consistent (Textbook). However, they do have an impact on the standard errors calculated and subsequently lead to larger variances. It is for this reason that measurement errors have an impact on the statistical analysis of the econometric model.
7.2 Industry controls dummy variable
The sample lacked a dummy variable for the industry an employee worked in, which plays a huge role in determining the wage of an employee. Wages in the financial services sector are higher at larger firms, as larger firms have larger clients who pay higher prices for the services they require, whereas larger retail firms like Next pay less than smaller boutique retail stores.
Omitting a potentially relevant variable is under-fitting the model, which is misspecification of a model. This leads to the OLS estimators being biased, and no longer being blue (best linear un-biased estimators). The impact of a omitted variable goes further if the variable is correlated to other explanatory variables; in this case co-efficients aren’t their true values and are inconsistent. This impacts the hypothesis tests and the confidence intervals, making the results inaccurate.
8. Project extension
Other variables I would have liked to included are:
Dummy variable MULTIPLE where = 1 if firm has more than one office
Measuring firm size by number of offices may be more useful for answering the question as as firms have more offices they have more clients.
Dummy variable PAYROLL where = 1 if firms have a payroll system in place
Firms without payroll systems are likely to pay cash in hand and below the minimum wage
Dummy variable REVENUE where = 1 if revenue is above £6.5M a year
This follows on from the government’s measure of small firms – revenue is better to use instead of number of employers as revenue has a direct impact on the ability to pay staff.
9. Conclusion
After running the model and analysing the data I have found that as firm size increases the hourly wage does also increase. The findings of my model build upon the work of Brown and Medoff as well as the human capital model and subsequently wage structure theory. Firms that are larger typically higher workers who left education later, and therefore are expected by employees to pay them a higher wage. Coupled with the fact that those who leave education before attaining GCSE’s expect to be paid less than someone with a degree and the wage disparity increases.
Whilst the findings back up previous theory and findings, my model suffered from misspecification, omission of a relevant variable and heteroscedasticity therefore the OLS estimators may not be BLUE and the standard errors and variances I used could be wrong, there in making the accuracy of significance tests reduced.
Appendix
Variables
Assumptions of the CLRM model
The model is specified correctly and parameters are linearly related
The explanatory variables are uncorrelated with the error term [Cov(Xi , ui)]
The mean of the error term = 0
The variance of the error term is constant [Var(ui) = σ2]
The error terms are not auto-correlated
Perfect multi-collinearity doesn’t exist – no two explanatory variables have a perfect linear relationship
The error term has a normal distribution
Summary statistics
Ramsey RESET test
Restricted model: lnhourW=β_1+ β_2 age+ β_3 age2+β_4 manager +β_5 gcse+6alevel+β_7 degree+ β_8 hourly+β_9 tenure
+β_10 small+u_i
Un-restricted model: lnhourW=β_1+ β_2 age+ β_3 age2+β_4 manager +β_5 gcse+β_6 alevel+β_7 degree+ β_8 hourly+β_9 tenure
β_10 Small+β_11 (lnhourW)^2+β_12 (lnhourW)^3
H0: Model is correctly specified H1: H0 is false
F test= ((R_NEW^2-R_OLD^2 ) ÷no.ofnew parameters )/((1-R_NEW^2 )÷(n-no.of new parameters))= ((0.4293-0.4239) ÷2 )/((1-0.4293)÷(3868-12))=18.225
F_2,3856^0.05=2.998 〖 F〗_2,3856^0.01=4.611
As our F statistic is greater than the critical values at both 1% and 5% significance levels I can reject the null hypothesis. The econometric model has misspecification.
Heteroscedasticity test: White’s general test
Original model:
lnhourW=β_1+ β_2 age+ β_3 age2+β_4 manager +β_5 gcse+6alevel+β_7 degree+ β_8 hourly+β_9 tenure +β_10 small+u_i
Auxiliary model:
(u ̂_i )^2=β_1+ β_2 age+ β_3 age^2+β_4 manager +β_5 gcse+6alevel+β_7 degree+ β_8 hourly+β_9 tenure +β_10 small+β_11 (age2)^2+β_12 (tenure)^2+β_13 (age*age2)+β_14 (age2*tenure)+β_15 (age*tenure)
H0: var(ui) = σ2 H1: H0 is false
nR^2~x_(k-1 )^2= 3868(0.05563)=215.16 x_(14 )^2=215.16
Critical value at 5% significance level with 14 d.f = 23.685
Critical value at 1% significance level with 14 d.f = 29.141
My test statistic is greater than both critical values, therefore I can reject the null hypothesis at both a 1% and 5% significance level. Heteroscedasticity exists within the model.
Overall significance of the model
H0: β2=β3=β4= β5=β6=β7= β8=β9=β10 H1: H0 is false
F=(R^2÷(k-1))/((1-R^2)÷(n-k)) ~ F_(k-1)(n-k) F=(〖0.4239〗^2÷(10-1))/((1-〖0.4239〗^2)÷(3868-10))= 93.901
Critical value at 5% significance level: 1.882 Critical value at 1% significance level: 2.412
The F statistic > critical values at both 1% and 5% therefore we reject the null that the explanatory variables all equal to zero.
Individual significance
T Test=(b-b ̂)/hcse(b)
Significance level 0.05 0.01
T3858 1.645 2.327
Test for β2 (age):
H0: β2 = 0 H1: β2 > 0 0.0508/0.00388=13.09 Our T value>critical values therefore I reject the null
Test for β3 (age2):
H0: β3 = 0 H1: β3 > 0 (-0.0005)/0.000046=-10.86 Our T value>critical values therefore I reject the null
Test for β4 (manager):
H0: β4 = 0 H1: β4 > 0 0.2131/0.0137=15.6 Our T value>critical values therefore I reject the null
Test for β5 (GCSE):
H0: β5 = 0 H1: β5 > 0 0.1491/0.0228=6.54 Our T value>critical values therefore I reject the null
Test for β6 (alevel):
H0: β6 = 0 H1: β6 > 0 0.3009/0.0254=11.85 Our T value>critical values therefore I reject the null
Test for β7 (degree):
H0: β7 = 0 H1: β7 > 0 0.5360/0.0256=20.9 Our T value>critical values therefore I reject the null
Test for β8 (tenure):
H0: β8 = 0 H1: β8 > 0 0.5360/0.0256=20.94 Our T value>critical values therefore I reject the null
Test for β9 (hourly):
H0: β9 = 0 H1: β9 > 0 -0.2458/0.0134=-18.34 Our T value>critical values therefore I reject the null
Test for β10 (small):
H0: β10 = 0 H1: β10 > 0 -0.1423/0.0150=9.49 Our T value>critical values therefore I reject the null
Structural equivalence of the small variable
H0: Model is structurally equivalent
H1: H0 is false
F test: ([RSS_R-(RSS_1+RSS_2 )]/k)/((RSS_1+RSS_2 )/(N_1+N_2-2k))
RSSR = RSS from original model RSS1 = RSS (small=0) RSS2 = RSS (small = 1)
F test: ([623.48-(436.96+185.05)]/9)/((436.96+185.05)/(3868-18)) = 1.01
Significance level 0.05 0.01
F9,3850 1.88 2.41
My f value of 1.01 is smaller than both critical values at 5% and 1% significance levels – therefore I can accept the null hypothesis that there is structural equivalence.
Bibliography:
Borjas, G.J. 2010. Labor economics (pp. 346-382). Boston: McGraw-Hill/Irwin.
Brown, C. and Medoff, J., 1989. The employer size-wage effect. Journal of political Economy, 97(5), pp.1027-1059.
Hewitt, B., Western, M. and Baxter, J., 2002. Marriage and money: The impact of marriage on men's and women's earnings. Negotiating the Life Course.
Ons.gov.uk. (2018). Employment, unemployment and economic inactivity by age group (seasonally adjusted): A05 SA – Office for National Statistics. [online] Available at: https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/employmentunemploymentandeconomicinactivitybyagegroupseasonallyadjusteda05sa/current [Accessed 4 Apr. 2018].
Patrinos, H.A., 2016. Estimating the return to schooling using the Mincer equation. IZA World of Labor.