Optimize Investigate Hypertension & BMI: Is Hypertension Induced by High BMI?

1.1. How many records are there in the dataset (“frmgham2.csv”)? How many participants

are there in the teaching dataset? Explain why these two numbers are different.

The database is a subset of the data collected as part of the Framingham study and includes a set of variables related to the medical history and adjudicated event data on 4,434 participants.

Each participant has 1 to 3 observations depending on the number of exams the subject attended, and as a result there are 11,627 observations on the 4,434 participants.

1.2. Create a file that contains only records related to the First Examination (“PERIOD=1”) and call it “DATASET1”. How many subjects undergo the first examination? Note that, all the questions below (from (Q1.2) onward) will be based on “DATASET1” only.

I used IBM SPSS to process the database that was received.

The code :

DATASET COPY DATASET1.sav.

DATASET ACTIVATE DATASET1.sav.

FILTER OFF.

USE ALL.

SELECT IF (PERIOD = 1).

EXECUTE.

DATASET ACTIVATE DataSet1.

1.3. Design a table with relevant summary/descriptive statistics, stratified by gender (“SEX”), to describe the population enrolled in the first examination. Write a paragraph (no more than 250 words) to describe the data and any findings from this table. This table should contain the following variables : “AGE”, “BMI”,”SYSBP”, “DIABP”, “CURSMOKE”, “CIGPDAY”, ”DIABETES”.

The indicated variables were labeled as is indicated in the Framingham Heart Study Longitudinal Data Documentation. The summary is displayed in Table 1.

The sequence

/Analyze

/ Descriptive Statistics

/ Explore

was used. Then I selected the variables.

The code used:

EXAMINE VARIABLES=AGE BMI SYSBP DIABP CURSMOKE CIGPDAY DIABETES BY SEX

/PLOT BOXPLOT STEMLEAF HISTOGRAM

/COMPARE GROUPS

/PERCENTILES (5,10,25,50,75,90,95) HAVERAGE

/STATISTICS DESCRIPTIVES

/CINTERVAL 95

/MISSING LISTWISE

/NOTOTAL.

In the study were participating 1923 men and 2460 women. The average age of men is slightly lower than age of women. 25% of men are less than 42 years old. 25% of men are 57 year old, or greater. Also, 25% of women are 43 years or less. 25% of women are 57 year old or greater.. The Body Mass Index (BMI) in men has is little bit higher than women, relative variability of the group of women is higher; the coefficient of variation is- 17.78% for women and 13.07% for men. 25% of men have a BMI less or equal to 23.96 while 25% of women have a BMI less or equal to 22.54. The quartile distribution of Blood Pressure suggests that there is a high incidence of arterial hypertension, since 25% of men have values greater than or equal to 141.50 mmHg and 25% of women have values greater than or equal to 146.38 mmHg. The relative greater variability is found in the number of cigarettes smoked each day, which in the case of men is 104.3%, while the case of women reaches 158.0%. There is a high incidence of cigarette smoking, which is 60.4% of men and 40.4% of women.

Table 1. Summary statistics the population enrolled in the first examination

Characteristic

Gender

Men

Women

( n = 1,944 )

( n = 2,490 )

Age at exam (years)

Mean ± SD

49.79 ± 8.72

50.02 ± 8.64

42.00

43.00

57.00

Systolic Blood Pressure (mmHg)

Mean ± SD

131.77 ± 19.33

133.74 ± 24.36

118.00

116.00

141.50

146.38

Body Mass Index

Mean ± SD

26.17 ± 3.42

25.59 ± 4.55

23.96

22.54

28.34

27.82

Diastolic Blood Pressure (mmHg)

Mean ± SD

83.75 ± 11.46

82.56 ± 12.41

76.00

74.00

90.00

89.00

Current cigarette smoking (%)

60.4

40.4

Number of cigarettes smoked each day

Mean ± SD

13.22 ± 13.79

5.67 ± 8.96

0.00

20.00

10.00

Diabetes (%)

3.0

2.5

[Question 2]

2.1. Examine the distribution of the variable “BMI” for those who have Hypertension and those who are Hypertension-free using histogram. Describe the distribution of this variable.

The Body Mass Index for the group without the disease is symmetrical, with the exception of the right tail of the graph, which shows the existence of atypical values or extreme values.

The Body Mass Index for the group of people who have Hypertension (fig 2), presents a bell form, but with a greater dispersion than in the previous case. It is observed that the range of variation of the data and the interval in which the greater number of observations is concentrated is greater.

2.2. Generate a box and whisker plot of “BMI” for those who have Hypertension and those who are Hypertension-free at First Examination . Include the boxplots in your answer. Describe different aspects of the box and whisker plot.

The box and whisker chart confirms the presence of numerous atypical values and extreme values in the two groups of Prevalent Hypertensive. Both distributions present positive asymmetry, since the data extend towards the higher BMI values. However, in the interquartile range in the distribution is more symmetrical. The median is a similar distances from the first and third quartiles in each group. The median BMI in first and third quartile is higher in the group of people with hypertension.

2.3. Using the histogram from (2.1) and boxplots from (2.2), state the relationship between Hypertension and BMI?

People with hypertension have higher BMI than people who do not have hypertension. The Hypertension seems to be associated with higher BMI values.

2.4 Conduct a statistical analysis to study the relationship between having Hypertension and BMI. State your hypotheses and write up your conclusions.

Figure 4- Histograms of Body Mass Index by Prevalent Hypertensive

The histograms of the BMI by Prevalent Hypertensive shows us the series are adjusted to the normal distribution and the samples are from independent populations, so that the parametric test of difference of means can be done.

Table 2- Group statistics for BMI by Prevalent Hypertensive

Prevalent Hypertensive

Mean

Std. Deviation

Std. Error Mean

Body Mass Index

Free of disease

2,992

25.0044

3.55539

.06500

Prevalent disease

1,423

27.6161

4.58385

.12151

Table 3-Independent Samples Test for BMI

Levene’s Test for Equality of Variances

t-test for Equality of Means

Sig.

Sig. (2-tailed)

Mean Difference

Std. Error Difference

95% Confidence Interval of the Difference

Lower

Upper

BMI

Equal variances assumed

61.975

.000

-20.70

4,413

.000

-2.61175

.12612

-2.8590

-2.3645

Equal variances not assumed

-18.95

2,264.035

.000

-2.61175

.13781

-2.8819

-2.3415

Taking into account the Two-Sample t-tests of independent samples, the null hypothesis that the mean of BMI for the two groups are equal is rejected (t = -18.95, df = 2264.04, p < 0.001). There is a significant big difference between those groups.

2.5. From your statistical investigation of the relationship between BMI and Hypertension above, can you conclude that Hypertension (or no-hypertension) is induced by high (or low) BMI? Explain your reason.

We can’t conclude that BMI is higher in the Prevalent disease group. This result does not prove that Hypertension is induced by high BMI. The hypothesis test used only allows us to make a conclusion about parameters of the two distribution analyzed and does not allow establishing causal relations.

[Question 3] In Q3 we are interested in the First Examination data only (“PERIOD=1” or “DATASET1”). Using the values of BMI, one can categorize a subject into “underweight”, “normal”, “overweight”, “obese” (4 groups). Create a new variable “BMIGP” using the definitions below…..

Body Mass Index was written into another variable with this code:

RECODE BMI (Lowest thru 18.49=1) (18.5 thru 24.99=2) (25 thru 29.99=3) (30 thru Highest=4)

INTO BMIGP.

VARIABLE LABELS BMIGP ‘BMI Groups’.

EXECUTE.

3.1. Display the frequency table of the new variable “BMIGP”. Include any missing values on your table if there any.

Taking into account the participants of Body mass Index groups we can say that 57 people represents 1,3% of the total valid data , and are underweight and 577 people who represents 13.1 % of the valid data are obese.

Table 4-Frequency distribution of BMIGP

Frequency

Percent

Valid Percent

Cumulative Percent

Valid

Underweight

1.3

Normal

1,936

43.7

43.9

45.1

Overweight

1,845

41.6

41.8

86.9

Obese

577

13.0

13.1

100.0

Total

4,415

99.6

100.0

Missing

System

Total

4,434

100.0

3.2. Cross-tabulate “BMIGP” with smoking status (“CURSMOKE”). What is the prevalence of smoking in each BMI group? What can you observe from these results in terms of the relationship between Smoking status and BMI?

As seen from table 5, in group of overweight participants the prevalence of smoking is 44.6%, while in normal weight group the prevalence is 57.5%. So we can tell that as the weight increases , the prevalence of smoking decreases .

Current cigarette smoking at exam

Total

Not current smoker

Current smoker

BMI Groups

Underweight

Count

% within BMI Groups

33.3%

66.7%

100.0%

Normal

Count

823

1,113

1,936

% within BMI Groups

42.5%

57.5%

100.0%

Overweight

Count

1,023

822

1,845

% within BMI Groups

55.4%

44.6%

100.0%

Obese

Count

376

201

577

% within BMI Groups

65.2%

34.8%

100.0%

Total

Count

2,241

2,174

4,415

% within BMI Groups

50.8%

49.2%

100.0%

Table 5-Crosstab of BMI Groups by Current cigarette smoking at exam

3.3. Conduct an analysis to study the relationship between smoking status and BMI groups. State your hypotheses and interpret your findings.

To check for the relationship between smoking status and BMI groups we use the Chi-square test of independence.

As indicated (Daniel, 2013), “perhaps the most frequent, use of the Chi-square distribution is to test the null hypothesis that two criteria of classification, when applied to the same set of entities, are independent. We say that two criteria of classification are independent if the distribution of one criterion is the same no matter what the distribution of the other criterion.”(!!!!!!!!!!!!)

Table 6- Chi-Square tests of BMI Groups by Current cigarette smoking at exam

Value

Asymp. Sig. (2-sided)

Pearson Chi-Square

123.759a

.000

Likelihood Ratio

124.906

.000

Linear-by-Linear Association

122.690

.000

N of Valid Cases

4,415

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 28.07.

Hypothesis :

H0: BMI Groups and Current cigarette smoking are independent.

H1: The two variables are not independent.

α = 0.05

So, taking into account the Chi-Square test, the null hypothesis shows no relationship between those two variables, and is rejected (2 = 123.76, df = 3, p < 0.001). However we can see the relationship between Body Mass Index groups and Current cigarette smoking.

[Question 4]

4.1. Use any software to generate a scatterplot of “SYSBP” (on Y-axis) and “BMI” (on X-axis). Include the plot in the report. Calculate Pearson correlation coefficient. Use visual inspection as well as Pearson coefficient to describe the relationship between BMI and SYSBP.

Using SPSS I generated the scatter plot and we can see a large number of points which are based in the bottom left of the scatterplot. The shape of the point cloud observed, so we can assume a weak positive linear correlation between BMI and Systolic Blood Pressure.

Figure 5. Scatterplot of Systolic Blood Pressure (mmHg) vs. Body Mass index

Table 7. Correlation coefficient between Systolic Blood Pressure (mmHg) and Body Mass index

Systolic Blood Pressure (mmHg)

Body Mass Index

Systolic Blood Pressure (mmHg)

Pearson Correlation

.328**

Sig. (2-tailed)

.000

4,434

4,415

Body Mass Index

Pearson Correlation

.328**

Sig. (2-tailed)

.000

4,415

**. Correlation is significant at the 0.01 level (2-tailed).

The PCC (Pearson correlation coefficient) is 0.328. As BMI increases, the Systolic Blood Pressure goes up.

Based upon the t-test for the correlation coefficient, the null hypothesis that the correlation of BMI and Systolic Blood Pressure in the population is zero is rejected (t = 23.11, df = 4432, p < 0.001). We can assume the linear correlation between BMI and Systolic Blood Pressure in the population

4.2. Assume Simple Linear Regression (SLR) analysis is to be used to study the relationship between Systolic Blood Pressure and BMI. Treat “BMI” as the independent variable and “SYSBP” as the dependent variable.

4.2.1. Write down the model/formula of SLR in term of “BMI” and “SYSBP”

The (SLR)- simple linear regression formula:

where ε is the error term, which is the difference between the observed value of BMI and the estimated value of BMI by the model.

(Gujarati, 2002) points out that this model is called linear because the parameters to be estimated (β0 and β1) are elevated only to the first power.

Therefore, it is a model that is linear in the parameters. In addition, this model will adjust the point cloud by a straight line. The parameter β0 corresponds to the intercept, while the parameter β1 corresponds to the slope of that straight line.

4.2.2. State the estimation method used to find the best linear fit of the data.

I used the method of least squares, and the resulting line is called least-squares line (Daniel, 2013).

It says that the method consists of minimizing the sum of the squared deviations of the observed values of the dependent variable from its estimated values with the regression line. In other words, it is a matter of minimizing the square of the distances of each point to the straight line obtained.

4.2.3. List the assumptions used behind SLR for SLR to be a valid model for data analysis.

The regression have the following assumptions:

Linear relationship: there is a linear correlation between the variables.

Multivariate normality: the variables are distributed normally.

No auto-correlation: there is a little or no autocorrelation in the data.

Homoscedasticity: The variation around the regression equation is the same for all of the values of the independent variables

4.2.4. Examine all the assumptions listed in (4.2.3) using “SYSBP/BMI” in “DATASET1”. Report whether each assumption has been satisfied or not and justify your answer.

The scatterplot in Figure 4 shows a weak positive linear correlation between the variables considered. So, first assumption is satisfied.

To evaluate the normality of the distributions of the variables, the histograms are shown in Figures 6 and 7. In each histogram is displayed the normal curve. The SYSBB variable is not normally distributed, but approaches that distribution. The BMI is normally distributed.

Figure 6. Histogram of Systolic Blood Pressure

Figure 7. Histogram of Body Mass Index

Figure 8. Boxplot of Systolic Blood Pressure by BMI groups

The Histogram of SBP (Systolic Blood Pressure), shows that, except for the values identified as atypical or extreme values in each group, the Systolic Blood Pressure presents a similar variability in the four groups. The intervals in which the Systolic Blood Pressure varies,overlap. As can be seen in the box diagram, the assumption of equality of variances is satisfied.

The absence of autocorrelation can be observed in Table 8. In the model summary is reported the Durbin-Watson d statistic. Since d = 1.947, very close to 2, there is no serial correlation. (Gujarati, 2002) establish that “As a rule of thumb, if an application finds that d is equal to 2, it can be assumed that there is no first order autocorrelation, either positive or negative.”

4.2.5. Conduct SLR using any computer software. Write out the estimate regression line. Is it the relationship between BMI and Systolic BP statistically significant at the 5% level? Justify your answer. Use the estimated coefficient(s) to explain the size of effect between BMI and Systolic BP.

Table 10 shows us that the regression equation is: SYSBP = 86.667 + 1.789 * BMI

There is no interpretation for intercept coefficient. This coefficient is not interpretable, since it does not make sense that there is a person with a BMI equal to zero. Additionally, when the BMI increases in a unit, the Systolic Blood Pressure increases in 1.789 mmHg.

The model summary shows that the standard error of the estimates = 21.1249. It is the dispersion of estimations with respect to the observed values. The model explains the 10.7% of the total variability of Systolic Blood Pressure.

Table 8. Model summary of regressionb

Model

R Square

Adjusted R Square

Std. Error of the Estimate

Durbin-Watson

.328a

.108

.107

21.1249

1.947

a. Predictors: (Constant), Body Mass Index

b. Dependent Variable: Systolic Blood Pressure (mmHg)

Table 9. ANOVAa

Model

Sum of Squares

Mean Square

Sig.

Regression

237,559.912

532.334

.000b

Residual

1,969,348.985

4,413

446.261

Total

2,206,908.896

4,414

a. Dependent Variable: Systolic Blood Pressure (mmHg)

b. Predictors: (Constant), Body Mass Index

As seen from ANOVA results (table 9), the null hypothesis of all the regression coefficient are equal to zero is rejected (F = 532.33; df = 1, 446261; p < 0.001; R2 =0.108). The constant and BMI has the ability to explain the variation in Systolic Blood Pressure.

Table 10. Coefficients of linear regression modela

Model

Unstandardized Coefficients

Standardized Coefficients

Sig.

95.0% Confidence Interval for B

Std. Error

Beta

Lower Bound

Upper Bound

(Constant)

86.667

2.029

42.722

.000

82.690

90.644

Body Mass Index

1.789

.078

.328

23.072

.000

1.637

1.940

a. Dependent Variable: Systolic Blood Pressure (mmHg)

Taking into account the regression result, the null hypothesis that the regression coefficients are equal to zero is rejected (t = 42.71, p < 0.001 for constant; t = 23.07, p < 0.001 for BMI).

The regression coefficient for the constant and the independent variable are not equal to zero and should be included in the model to predict Systolic Blood Pressure.

References

Daniel, W. W., & Cross, C.L. (2013). A Foundation for Analysis in the Health Sciences.

Gujarati, D. (2002). Basic econometrics: With software disk package. New York: McGraw-Hill.

Essay: Optimize Investigate Hypertension & BMI: Is Hypertension Induced by High BMI?

Essay details and download:

Text preview of this essay:

References

About this essay:

Essay details and download:

Text preview of this essay:

References

About this essay:

Essay Categories: