**Tags**

Business Research, Calculation, Concept, Continuous, Free, Interval Scaled, R, Sample Size, Tutorial, Variables

**Sample Size Calculation in Research**

In many research, the researchers are concerned with adequate sample size determination to carry out their research. In fact it not just the formula which calculates you the sample size, it also depends upon the assumptions and art of judgment to decide what sample size actually be take.

However, in this tutorial I would discuss the mathematical formula to determine the sample size.

**Sample Size Calculation when there are Continuous or Interval Scaled Variables**

Let us consider a case wherein a researcher is studying “How satisfied are the consumer with the use of XYZ brand’s product ABC”. In this case, the researcher prepares a questionnaire which consists of 9-10 questions related to various parameters of consumer satisfaction. For the sake of easiness, let us assume that she wrote all the questions which measure answers on a Likert scale of 1 to 10 where 1 measures “Totally Unsatisfied” and 10 as “Totally Satisfied”. The scale would look like as shown in the following figure:

For the studies involving interval scaled as above, the formula for calculation of sample size is:

Now let us understand this formula and its variables.

**N** is sample size required for the study

**Z** is the value of Z-score from standard normal distribution table for a particular confidence interval level as desired by the individual researcher. For a confidence level of 95%, the Z-score value is 1.96 (two tailed probability value of 0.95). Similarly, for 90% confidence level, the Z-score value is 1.645 and so on. One thing to remember is:

**“In researches where we select a sample frame out of the total population, if a researcher chooses confidence interval of 100%, this means the census of the population.”**

Therefore, usually confidence interval of 100% is practically very difficult if not impossible as in that case she has to go for entire census of population rather than a sample to represent the meaningful outcome. In almost all such research studies 90% or 95% confidence level is adequate enough.

The **σ** in the above formula is **Standard Deviation (SD) **for the variable which the researcher is trying to measure in the study (in out example this variable is Customer’s satisfaction from brand XYZ’s product ABC). In fact it is an unknown quantity in this case (and in most cases of similar type of studies) because we have not yet taken sample and conducted the study and we have not arrived on any measure of deviations. Therefore, we can choose one of the following options for SD calculation:

- If any of the past studies has measured this variable, we can use that. It serves the purpose well, if not exactly.
- We can conduct a pilot study and then can measure SD and use the measured value for that variable in the formula above.
- If the range of variable we are studying is known then we can divide that range with the standard normal distribution SD value and obtain a value for our variable in the formula above. For a 99.7% confidence interval, practically almost all the variables lie within the SD of
**±3**i.e. all the variables will lie within a SD range of 6, therefore, if we know the range of the variable in the study (i.e. satisfaction) then:

SD value for sample calculation = (Range / 6) = (Maximum value of variable – Minimum value of variable) / 6

Lastly, the **e _{t}** in the above formula is

**“Tolerance for Error”**i.e. the level of tolerance for the errors in the study. The value of this variable totally depends upon the either researcher’s decision or the sponsors of the study. Since this variable is a divisor, the lower the tolerance for the error, the higher the sample size and vice-versa. One critical thing to remember while selecting the value for tolerance for error is that the unit of measurement should be same as

**SD**. In other words, if

**SD**is calculated based upon range, then

**e**should also be measured as range or if

_{t}**SD**is measured as a percentage or probability,

**e**should also be in either percentage or probability. I will elaborate this in the next tutorial in this tutorial series of sample size calculation.

_{t}

Now, in our example above, let us assume that:

- The customer’s satisfaction is measured on a scale of 1 to 10 where 1 being lowest satisfaction while 10 being the highest satisfaction level. Therefore, range = 10 – 1 = 9. This means that the SD
**σ**= 9/6 or**σ**= 1.5 - The confidence level in the study is chosen as 95% and hence Z-score = 1.96 for the study
- Researcher choses level of tolerance for errors as
**±0.5**(a range) for the study. This means the errors in measuring the customer’s satisfaction can be tolerated within ±0.5 level from the actual values of variable ‘satisfaction’ (i.e.**1 to 10****±0.5)**with a confidence level of 95% (point no. 2 above)

Since we now know all three variables required to calculate sample size, therefore:

Therefore, a sample size of approximately 35 respondents in the survey would give the researcher the estimate of customer satisfaction, measured on 1 to 10 point scale with a tolerance level of errors within ±0.5 of the actual range at a confidence level of 95%.

**Remember:**

- If confidence level is higher, the sample size will be high and vice-versa.
- If Standard Deviation in the study is high, the sample size will be high and vice-versa.
- If Error Tolerance Level is low, the sample size will be high and vice-versa.

**##### R Tutorial #####**

**My.Function <- function(){**

**Z.Score <- readline(“Please Enter the Value of Z-Score (e.g. 1.96, for CL 95%): “) **

**SD.Value <- readline(“Please Enter the Value of Standard Deviation (e.g. 1.5): “)**

**E.T.Value <- readline(“How Much is Your Tolerence Level for Errors (e.g. 0.3): “)**

**Z.Score <- as.numeric(unlist(strsplit(Z.Score, “,”)))**

**SD.Value <- as.numeric(unlist(strsplit(SD.Value, “,”)))**

**E.T.Value <- as.numeric(unlist(strsplit(E.T.Value, “,”)))**

**N.Value <- ((Z.Score * SD.Value)/E.T.Value)^2**

**NValue<- round(N.Value, digits = 0)**

**return(writeLines(c(“Required Sample Size is: “, NValue, “(Approximately)”), con = stdout(), sep = ” “, useBytes = FALSE))**

**}**

**###### RUNNING ABOVE R FUNCTION ##########**

**> My.Function()**

# will prompt:

**Please Enter the Value of Z-Score (e.g. 1.96, for CL 95%): **1.96

**Please Enter the Value of Standard Deviation (e.g. 1.5): **1.5

**How Much is Your Tolerence Level for Errors (e.g. 0.3): **0.5

## Output is: ###

**Required Sample Size is: 35 (Approximately)**

Good Luck!! Happy R-Learning…..

**Manoj Kumar**