Home > Sample essays > How Variables, Sample Variance, Alpha Level and Chi-Square Affect Data Analysis

Essay: How Variables, Sample Variance, Alpha Level and Chi-Square Affect Data Analysis

Essay details and download:

  • Subject area(s): Sample essays
  • Reading time: 6 minutes
  • Price: Free download
  • Published: 1 April 2019*
  • Last Modified: 23 July 2024
  • File format: Text
  • Words: 1,566 (approx)
  • Number of pages: 7 (approx)

Text preview of this essay:

This page of the essay has 1,566 words.



For this assignment I will be explain sample variance, statistical significance and alpha level as well as, chi-square. This will then lead on to my data analysis where I have created tables and a report to show my results and how I achieved them.

Sample variance

Sample variance is essential if you are trying to determine how varied a sample is, and can also be defined as “the average of the squared differences from the mean”. In other words, you must figure out what the mean is, then for each value minus the mean and then square the answer. Therefore, this then leaves you to work out the average for the squared differences. However, in terms of a samples this is based on a select number of things which have been taken from a certain population for example, the female population in the UK. Furthermore, statistics are particularly important factor to consider as this helps establish any issues within data as well as, ensuring the supposed relationships that have occurred are significant. Finding the average of data is one of the main methods in doing this, also known as finding the ‘mean’. Another useful statistic to use would be the median as this presents you with the middle value with an equal amount of number either side. On the other hand, variability consists on focusing on how spread out data is (distance between mean and each score) which allows you to recognise if any results are noticeably different from one another. Furthermore, deviation scores are used in order to find the complete variability of the data, which is done by adding the deviation of every result from the mean. In addition, the most straight forward way of calculating a deviation score is to, minus every one of the scores then taking away the mean result. When calculating an average deviation score, you can’t use the ‘mean’ method as the sum would always be 0 meaning this would be your answer. Therefore, you could square the deviations which would make them more than just 0.

The formula for population variance looks like this:

In terms of the relationship between both the standard deviation and variance, they both measures of variability.

Statistical significance and alpha level

Hypothesis testing provides a strategy to challenge and test results collected in order to see if they are valid or not. A hypothesis is a clear statement which should consider variables, and must be testable. Furthermore, if a null hypothesis turned out to be correct, the sample mean would differ from the population value due to the fact that the sample mean is only the average of a section of the whole sample. In addition, the population value consists of how people in a group are different in comparison to the mean value of the group. Furthermore, the alpha level involves the probability of making the wrong choice if the null hypothesis is correct. This meaning that this could almost work as a prediction which could minimise further uncertainties within the data results. This relates to critical values as this is a point involving the test distribution which is then compared with the test statistic in order to decide if the null hypothesis should be rejected. Moreover, an example of a value used for alpha would be if the alpha is 1-0.90 this would equal 0.10 meaning the higher the percentage level is which is taken away from the alpha, the lower the result will be. Therefore, this potentially making the test stricter. In addition, if a critical region falls, this could have an effect of the probability level in terms of the null hypothesis. Therefore, this relates to statistical significance as the results weren’t random as not considering everyone in the research may be the cause of inaccurate samples being taken.

Chi-squared

This statistic, is used in order to put data into certain groups and focuses on frequencies instead of the numerical value. In addition, it can also make assumptions in terms of the population distribution. Furthermore, it is also useful in terms of having a statistically significant relationship involving variables.

The formula for chi-squared is:

In relation to hypothesis testing, when the chi-square value is calculated this allows you to recognise whether or not it is within the critical region. Moreover, for the chi-square test for independence, there is no relationship of variables for the null hypothesis. For instance, if the chi-square value is big this shows that the expected values and sample values differ. Furthermore, if it was a smaller value this would show less of a difference from what is expected.

For this report I will be comparing the data I have found, explain how I got these results as well as, providing reasons behind them.

There are four different levels of measurement which are, nominal scale, ordinal scale, interval scale and ratio scale. Firstly, the nominal scale involves just categories however, only having two options which is also known as a ‘dichotomous scale’. For instance, you could have categories such as, male and female. Furthermore, ordinal scale consists of both categories and order. An example of this would be, ranked horses in a race. Both of these levels of measurement can use the median but not the mean. In addition, interval scale includes categories, order and equal intervals like, the temperature in a room in Celsius. Lastly, the ratio scale includes the most which are, categories, order, equal intervals and the true zero. Therefore, this could be a person’s height in centimetres or their age in years. In terms of my data, gender is one of the main points to pick up on within the frequency graphs, automatically implicating that the level of measurement used is, nominal. This is due to the fact there is only two categories which are yes and no or male or female for the first part of the frequency tables. However, another type which is also used wold be the ordinal scale which relates to the crosstabulation due to the fact the categories rank from strongly agree to strongly disagree.

Furthermore, when collecting all my results I was faced with missing data. In order to overcome this and ensure the data I was filling in was correct could simply be checked by looking at the total percentage of the frequency tables. This is essential that it all adds up to 100% or this could imply that the data you have filled in is incorrect resulting in inaccurate overall results. Therefore, this made me go back and check all the totals to limit the chance of this happening. This was also done by ensuring there was not any ‘not stated’ values and that they were present in the missing column instead.

When forming this data, the statistical test I used was SPSS. This was appropriate for what I was doing due to the fact this software allows you to create both tables and graphs for large amounts of data, which I had. In addition, by converting the data into tables and graphs makes the data analysis more straightforward and easier to interpret. SPSS includes two different outputs which are, the output outline and the actual output. The output outline includes things like your headings and tables which keeps everything organised and set out in an order.

For the frequency tables it was important to look at the mean, median, mode and range. Firstly, the mean consists of adding all the results together then dividing this by the amount of result there are. Therefore, this means you are finding the average of all your scores. In addition, the median involves putting all the scores into order (e.g. smallest to largest) then picking the middle value. This is also known as central tendency. Moreover, for the mode you simply pick the number with the highest frequency. However, this is the only option for nominal data to use. Lastly, the range is calculated by taking the smallest score away from the largest. Therefore, using the mean would probably give you the most accurate and significant data as you are just finding the average which can be essential for analysing data.  An example from my data would be, under the mean amount of loans for women it shows a mean score of 395.7627. However, if you compare this to the results received for the median and mode which were .0000 and .00. Therefore, this suggesting how significant the mean can be in comparison to the others.

There are different types of variables which are, discrete and continuous. Discrete variables consist of a value which can be counted as a whole number. For example, the number of children, as you can’t have half a child. Whereas, continuous variables can be any value such as, the time. For different tables, different variables are used. For example, those which are measured with a nominal scale (e.g. for the number of male and female student’s frequency table) can only be discrete data as for instance, you can’t get half a female.

Overall, my graphs show similarities in different ways for instance, graphs which included the cumulative frequency all had a positive correlation due to the fact that when one variable increased, this means the second one does too. This meaning that a pattern is occurring due to the variable moving in the same direction showing a relationship between both variables.

About this essay:

If you use part of this page in your own work, you need to provide a citation, as follows:

Essay Sauce, How Variables, Sample Variance, Alpha Level and Chi-Square Affect Data Analysis. Available from:<https://www.essaysauce.com/sample-essays/2017-12-7-1512655064/> [Accessed 14-04-26].

These Sample essays have been submitted to us by students in order to help you with your studies.

* This essay may have been previously published on EssaySauce.com and/or Essay.uk.com at an earlier date than indicated.