Occurrence of death in Ghanaian female pensioners who retired from 1990 to 2005 at SSNIT

This chapter discusses the scope of data, data processing. and actuarial modeling process.
This chapter also discusses the various methods adopted for this study. The study focused on pensioners and the occurrence of deaths.
3.1: Scope of Data
The study focus on the occurrence of death in Ghanaian female pensioners who retired from 1990 to 2005 at SSNIT. These pensioners include those who retired voluntarily between 54 to 60 years and those who retired from 60 to 65 years. All the pensioners were exposed to investigation from the day of retirement to 2010. Each one of the pensioners was observed from the age of retirement to 80 years and occurrence of death recorded over a year period. The study ended investigation pensioners at 80 years because deaths recorded after 80 years were insignificant which will affect the output of the analysis.
Secondary data from Social Security and Nation Insurance Trust (SSNIT) which consist of 2,178 female pensioners was sampled from a five year period pension interval; 1990, 1995, 2000 and 2005. The total occurrence of deaths recorded within the five years interval period was 424.
The table below describe the selection of the cohort groups
3.2: Data collection
The study use a quantitative research to model the occurrence of death in Ghanaian female pensioners. Secondary data was obtained from SSNIT which consist of female pensioners’ for the periods 1990 to 2010 age 55 to 80 years. The data contain information on the date of birth, death if any and year of retirement of pensioners. the life certificates were updated pension year 02/06/2014 as at the time data was retrieved for the study purpose. However, any pensioner whose life certificate had not been updated as at that date was assumed dead until otherwise proved.
The general pensioners’ population includes invalidity pensioners, hazardous workers pensioners, old age pensioners, and early retirees. But for this study purpose, the target population comprises of both old age and early valid retirees. The old age retirees were individuals who go on retirement at the normal retirement age of 60 years while the early retirees include individuals who voluntarily go on retirement from the ages 55 to 59 years.
In other to obtain a homogenous group of early retirees and old age pensioners for a period of five intervals pension years 1990, 1995, 2000 and 2005, purposive sampling was employed to select individuals from the general pensioners population to form the cohort group.
The total sample obtained from the sampling consists of 2,178 female pensioners. These cohort groups of pensioners were grouped according to the year of retirement from age 55 to 80 years. For the pension year 1990 we have 61 pensioners, 1995 pension year we have 358 pensioners, pension year 2000 we have 670 pensioners and 1,089 pensioners for 2005 pension year.
For the purposes of this study some of the data that have major inconsistences were discarded. These inconsistences include inaccurate or blank date of birth, retirement and death, very late entry into the pensioner category. On the average about 10% of the total population was excluded before arriving at the sample size stated. The total general population was about 120,000 pensioners for both male and female. As at the time of the study, pensioners who have not renewed their life certificate and have had their pension payments seized were assumed dead at the date of last update. Out of the 2,178 female pensioners selected from the general population 424 deaths were recorded.
For confidentiality purpose member identification numbers were removed and data were regrouped to have three essential details; date of retirement, date of death or last update and current age if still alive. Data was further sorted and regrouped to obtain in each target year, age at pension, number of deaths at each age, and the exposed to risk at each age. Pensioners were exposed to investigation from the pension year to June 2014 and were observed from ages 55 to 80 years. The investigation was done only up to age 80 years because after 80 years reported deaths were very scanty and to avoid distorted or misleading results.
3.3: Methodology
Secondary data was used for the research which gives the number of workers who retired at a certain age x to x+1 as the exposed (Ex) within the year. It also counts the number of pensioners who died in a particular year (dx). The crude mortality rate (qx) produced at a particular year is discrete and not smooth. Graduation is done to change the discrete to continuous and for smoothness using Poisson model. But the data on the female mortality has excess zeros which the Poisson model did not fit. A zero inflated Poisson (ZIP) logit model was proposed.
Exposure-to-risk (Ex):
The Ex denotes the number of person years lived during year by people aged x at the start of the year. Assuming that people who die during a year have on average been alive during half of the year, the exposed-to-risk can be approximated by the number of survivors plus half the number of deaths in this group. (Pitacco et al, 2009). The differences in observation periods are accounted for by the count model by including the log of the exposure variable in model with coefficient constrained to be one. The exposure makes use of the correct probability distributions that is why it is superior in many to analyse rates as response variables. Also the exposure is used to adjust counts on the response variable and it is possible to various kinds of rates, indexes or per capita measures as predictors.
Production of Crude Mortality Rates for 1990, 1995, 2000 and 2005
The crude mortality rate for a given age for any given year is the probability that a person at age x dies that year. Crude mortality rates are usually calculated by simply dividing the relevant number of deaths by the number of life-years that were exposed to the risk of death over that period. The crude mortality rates for each plan year 1990, 1995, 2000 and 2005 were developed accordingly.

Description of Female Pension Data
Pension data is considered to be of the form of number of deaths and number of living pensioners who are exposed to death which are in cells by year of death and age at death. The study focus on the occurrence of death for a year which gives a count (discrete) variable outcome. A total of 424 deaths occurred within the five year interval period from age 55 to 80 years. The data was cleaned by discarding all pensioners who are over 80 years since much record was not recorded. The R software was then used to analyse the data by finding the descriptive statistics for each cohort group. The result from the output which shows there were excess of zeros with large variation was used to propose the model to be used for the data. The following models were proposed to model the data; zero inflated negative poisson and negative binomial. Before discussing them let’s consider poisson regression model and the zero inflated model. The response variable is the number of death that occurred in the year and is represented by y and the predictor variable is the age at which death occurred and is represented by x.
Models
Pension data consist of count variable outcome interest which might contain too many zeros. And with this count data the expected number of occurrence of death is the dependent variable and the age is the predictor variable. Different models were proposed to fit count data with too many zeros than expected: Lambert (1992) described the zero-inflated Poisson regression models with an application to defects in manufacturing; Hall (2000) also described the zero-inflated binomial regression model and incorporated random effects into ZIP and ZIB models.
Many count datasets has the joint presence of excess zero observations and long right tails features that may be accounted for by over-dispersion in the data, which are both relative to the Poisson assumption, Gurmu and Trivedi (1996). The proportion of the zeros increase whenever there zeros are too many relative to the Poisson assumption, so the negative binomial regression and zero-inflation negative binomial regression model tend to improve the fit of the data. The model selection is done using the likelihood ratio test.
Poisson regression model
Poisson regression model is used to model count data. It is a discrete probability distribution that is used to model the number of events occurring within a given time interval. The Poisson distribution models the log-odds as a linear function of the observed covariates. This gives the generalized linear model with Poisson response and ling log.
If the number of occurrence has a variable Y which has a poisson distribution with parameter μ and it takes integer values of y = 0, 1, 3, … then the probability distribution is given by
P(Y = y) = (μ^y e^(-λ))/y! ; λ > 0 3.1
where λ is the shape parameter which indicates the average number of events in the given time interval.
The poisson distribution has mean and the variance that can be shown as
E(Y ) = var(Y ) = μ
If it is true that the mean is equal to the variance, then any factor that affects one will also affect the other. The Poisson distribution can only be applied under the following assumptions;
1. the event is something that can be counted in whole numbers;
2. occurrences are independent, so that one occurrence neither diminishes nor increases the chance of another
log(μ) = β_0 + β_1 x 3.3
Where; x denotes the vector of explanatory variables and β the vector of regression parameters.
However, this model was not did not fit the data for the study since the mean is not equal to the variance even though it is a count data. This was due to the excess zeros in the data which were not sampling error but outcome. A Zero-Inflated-Poisson was proposed.
Zero-Inflated-Poisson (zip)
The data that has excess of zero counts is model by zip regression model. Theory suggests that the excess zeros are generated by a separate process from the count values and that the excess zeros are modeled independently. The zip model has two parts, the first part use Poisson to mode the count model and the second use logit model to predict excess zeros. Zero-inflated models estimate two equations simultaneously, one for the count model and one for the excess zeros.
Pr(yi = 0) = π + (1- π)e^(-μ) 3.4
Pr(Yi = yi) = (1 – π) (μ^(y_i ) e^(-μ))/y_i , y > 0 3.5
Where yi is the outcome variable with any non-zero value, μ is the expected Poisson count for the ith individual and is the probability of the extra zeros. The zip regression model has mean to be (1- π)μ and the variance is μ(1- π) (1+ μπ). This model fit best if the data is not over dispersed with the mean larger than the variance.
Negative Binomial Regression Model (NB)
The negative binomial regression model is a parametric model that is more dispersed than the Poisson which can handle the over dispersed situation in the data. Given y to be the respondent variable of the number of death occurrence in a year and that y ∼ Poisson (μ), whereas μ is a random variable with a gamma distribution. Now if
y/μ ~ Poisson (μ) and μ ~ Gamma(α,β),
Where the gamma distribution has mean αβ and variance αβ2, with probability density
P(μ)= 1/(β^α Γ(μ)) μ^(α-1) exp⁡(-μ/β); μ>0 3.6
Then the negative binomial with unconditional distribution of y is
P(y) = (Γ (α+y))/(Γ (α)y !) (β/(1+ β))^y (1/(1+ β))^α, y = 0, 1, 2, … 3.7
This distribution has mean
E(y) = E[E(y / μ)] = E(μ) = αβ
and variance Var(y) = E[Var(y / μ)] + Var[E(y / μ)]
= Var (μ) + E (μ) = αβ+ αβ^2
Expressing the negative binomial distribution in terms of the parameters μ = αβ and k = 1/α, that the E(y) = μ and Var (y) = μ + kμ^2 (function is quadratic)
Therefore the distribution of y is given by
P(y) = (Γ (k^(-1)+y))/(Γ (k^(-1) ) y !) ((k μ)/(1+ β))^y (1/(1+ k μ))^□(1/k), 3.8
Note that the negative binomial distribution approache Poisson (μ) as k → 0.
To model the negative binomial, let yi ~ Negative (μ_i,k) with the log link, so that
Log μ_i = β0 + β1×1 + … (for offset) 3.9
Zero-Inflated Negative Binomial Regression
Data with excess zeros that uses the zero-inflated model assumes the outcome of the zeros is due to two different processes. The study data considered occurrence of death in Ghanaian female pensioners. the occurrence have two process; first that a pensioner death occurred which give a count outcome (non-zero death) and the second no death occurred which give a possible outcome of zero. The first part of the process which is the zeros is modeled by the logit whereas the negative binomial model is used to model the second part of the process which is the count. The expected count is expressed as a combination of the two processes;
E(n death occurrence = k) = P(no death)*0 + P(death)*E(y = k/death)
Zero inflated negative binomial distribution is a mixture of distribution which assign a amass of p to extra zeros and mass of (1 – p) to a negative binomial distribution , 0 ≤p ≤1 . it is a continuous mixture of Poisson distribution with mean μ o be gamma distributed and modeled the over dispersion. For better understanding of the zero-inflated negative binomial regression, review the negative binomial model;
P(Y = y) = (Γ (α +y))/(Γ (α) y !) (( μ)/(1+ β))^y (1/(1+ k μ))^α, y = 0, 1, 2, …;μ,α>0 3.10
Where μ = E(Y), α is the shape parameter which quantifies the amount of over dispersion and the response variable of interest is Y and the variance of Y is α + μ^2/α.. the ZINB distribution is given by
P(Y) = y) {█(p+(1-p) (1+ μ/α)^(- α), y=0@(1-p) (Γ (y+ α))/(y ! Γ (α)) (1+ α/μ)^(- y), y =1,2,… )┤ 3.11
The zero inflated negative binomial distribution has mean E(Y) = (1 – p) μ and variance to be Var (Y) = (1 – p) μ (1+pμ+ μ/α) , respectively. Note that the zero inflated negative binomial distribution reduces to Poisson distribution if both 1/α and p ≈ 0.
Model selection
Comparing the two models to select the one that best fit the study data, the Akaike Information Criteria and the Bayesian Information Criteria was used. The model that has the lowest AIC and the BIC is selected to be the best fit.
Likelihood function
Suppose a set of parameter value θ, with given x outcomes, then the likelihood function is the probability of those observed outcomes;
Suppose a given parameterized family of probability functions in the discrete distribution case;
where θ is the parameter, the likelihood function is
written
with x being the observed outcome of the data. Alternatively, when f(x | θ) is viewed as a function of x with fixed θ, it is a probability density function, and when viewed as a function of θ with x fixed, it is a likelihood function.
From a geometric standpoint, if we consider f (x, θ) as a function of two variables then the family of probability distributions can be viewed as a family of curves parallel to the x-axis, while the family of likelihood functions are the orthogonal curves parallel to the θ-axis.

Essay: Occurrence of death in Ghanaian female pensioners who retired from 1990 to 2005 at SSNIT

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: