How Sample size Determines Representativeness of Data in Research

Sample size plays a vital role in determining the representativeness of the set of data that a researcher considered for sampling. It depends on the maximum allowable or acceptable error that a sampling can have or it can refer to the accuracy that a researcher desires (i.e., required accuracy). When the sample size is small in its representativeness, the change of increase in error is on the higher side. Confidence interval of 95% or 99% (Lynn & Elliot, 2000) of the sampling that a researcher needs also determines the sample size. Other things being equal, larger samples result in survey estimates having smaller variance (smaller standard errors). Variance is inversely proportional to sample size, and hence standard errors and confidence intervals are inversely proportional to the square root of the sample size. For example, doubling a sample size will tend to reduce standard errors by around 29% (the multiplying factor being 1/ 2) as discussed by Lynn and Elliot (2000).

Sample size determination is often an important step in planning a statistical research and it is usually a difficult one. Among the important hurdles to be surpassed, one must obtain an estimate of one or more error variances and specify the effect size of importance (Lenth, 2001). There is the temptation to take some shortcuts and statistical research (surveys, experiments, observational studies, etc.) are always better when they are carefully planned. Not all sample size problems are the same, nor is sample size equally important in all research. For example, the ethical issues in an opinion poll which are very different from those in a medical experiment and the consequences of an over or undersized study also differ and sample size problems are context-dependent as discussed by Lenth (2001).

The determination of sample size in quantitative research is a common task for many researchers using this methodology. However, since many research studies often use data collection methods such as surveys and other voluntary participation methods, the response rates are typically well below 100%. Salkind (1997) recommended oversampling when he stated that by mailing out surveys or questionnaires, then we need to count on increasing the sample size by 40% to 50% to account for lost mail and uncooperative respondents (Salkind, 1997). Fink (1995) stated that oversampling can add costs to the survey but it is often necessary. Cochran (1977) stated that a second consequence was due to the fact that variances of estimates are increased because the sample actually obtained is smaller than the target sample. These factors; can be allowed for, at least approximately, in selecting the size of the sample (Cochran, 1977). However, many researchers criticized the use of over-sampling to ensure that this minimum sample size is achieved and suggestions on how to secure the minimal sample size are scarce.

The sample size formulas and procedures used for categorical data are very similar, but some variations do exist. Assuming a researcher has set the alpha level a priori at .05 and plans to use a proportional variable, has set the level of acceptable error at 5%, and has estimated the standard deviation of the scale as 0.5. Cochran’s (1977) sample size formula for categorical data and an example of its use is presented here along with explanations as to how these decisions were made below.

The determination of sample size is a common task for many organizational researchers. Inappropriate, inadequate, or excessive sample sizes continue to influence the quality and accuracy of a research. The research conducted by Bartlett, Kotrlik and Higgins (2001) described the procedures for determining sample size for continuous and categorical variables using Cochran’s (1977) formula. Although it is not unusual for researchers to have different opinions as to how sample size should be calculated, the procedures used in the process should always be reported, allowing the reader to make his or her own judgments as to whether they accept the researcher’s assumptions and procedures. In general, a researcher could use the standard factors identified for the sample size determination process. Using an adequate sample along with high quality data collection efforts will result in more reliable, valid, and generalizable results; it could also result in other resource savings.

Research conducted by Rao, Yoonkyung and Jason (2010) on the calculation of sample size for a validation study to meet pre-specified sensitivity and specificity requirement state it is done in such way so as to avoid futility of pharmacogenomic development. Change of platforms is taken into account in the sample sizes calculation by statistical modelling. The proposed formulation for meeting minimal sensitivity and specificity requirements calls for estimation of both measures. Their confidence lower bounds can substitute the unknown true values in the sample size calculation procedure. However, the confidence level has to be calibrated for appropriate sample sizes to ensure that the probability of a successful validation experiment exceeds a desired level. The study by Rao, Yoonkyung and Jason (2010) shows the relationship between the underlying sensitivity and the required confidence level in a normal distribution setting. The results can be used as a practical guideline to set the level of confidence adaptively.

The study on practical guidelines on effective sample size conducted by Wan M., Wan Abdul A., Nor Azlida and Norizan M. (2012) is to determine the right sample size in observational study which is focusing on medical or health sciences field. Sample size calculation is actually depending on the type and how the research is designed. For example; different formulas are used to calculate the sample size in different type of research. In the research by Wan M., Wan Abdul A., Nor Azlida and Norizan M. (2012) the formula of single proportion and two proportions is discussed and an example from medical research is used which may contribute to the understanding of the problem. The application of the proposed formula for the sample size determination has been discussed by approaching the case of determination associated factors of HIV-infected Tuberculosis. Ahmad et al. (2011) pointed out that the larger the samples the more confident we can be that their answers truly reflect the population. However, there are few guidelines that have to be addressed in the particular area of health sciences.

The research conducted by Delice (2010) investigated 90 qualitative master’s thesis submitted for the primary and secondary school science and mathematics education departments and the mathematics education discipline of 10 universities in Turkey between 1996 and 2007, in terms of population and sample using document analysis. For generalizability and repeatability, identification of sample size is essential (Delice, 2010 & Henn, 2006). The purpose is to apply the relationship obtained among the variables to the general the population. That is why the selection of sample representative of the population is essential. Every research investigates simultaneously a number of variables with differing variability. A variable with a greater variability will require a larger sample to achieve a certain precision level than a variable with a smaller variability. When we use the largest sample, cost and time is a problem and we need to choose the sample size based on the variable for which the greatest precision is required.

Essay: How Sample size Determines Representativeness of Data in Research

Essay details and download:

Text preview of this essay:

About this essay:

Essay details and download:

Text preview of this essay:

About this essay:

Essay Categories: