Tags

, , , , ,

Sample Size Calculation when there are Non-Continuous Variables

So far we have discussed sample size calculations when there are only continuous variables in the study. For example, all the questions in a questionnaire are measured on interval scale such as in case of liking about a product is measured on a scale of 1 to 5 where 1 being Totally disliked and 5 being Totally Liked. What if, the scales are non-continuous or in proportion or percentage? In cases, where variables are measured in non-continuous scales (e.g. proportion or percentage such as in cases of dichotomous questions with only two choices of answers), a variation of formula is used.

Before we jump on to the formula, let us look at the following example:

Suppose we want to learn through a study that how many people like our brand’s products and we assume that out of every 4 consumers, one likes our products. But why we assumed that? Because, one, before this study, we do not know the exact proportion of consumers (out of the total) liking our products, and two, without assuming this proportion (or percentage) we can’t use the formula for sample size calculation, which is shown below:

4

 

Here,

N is Sample Size to be calculated.

P is the value of proportion for the frequency of occurrence of something in the study e.g. in our example above, we assumed that one in four likes out products, so P = 1 in 4 our ¼ or 0.25 or 25%.

(1 – P) is the probability of not happening of what we assumed i.e. the probability that not every one in four likes our product, so, the (1 – P) will be 1 – 0.25 = 0.75 or 75% or ¾.

Z is Z-score value from the probability table of normal distribution. For example, for a 95% confidence level, the Z-score value is 1.96.

et is the value (in either proportion or percentage) of tolerance level for the errors in the study. For example a researcher (or sponsors) might set the level of tolerance for errors to be 3% or 1 in 20 etc. Since P is in either proportion or percentage, et has to be in either proportion or percentage.

Therefore, in an example study, suppose a researcher wants to learn what shout should be her sample size in the study on “How many people like Microsoft Excel™”, and initially she assumes with 95% confidence level that 1 in 2 likes it, and she sets a tolerance for error to be 3%, then sample size can be calculated as follows:

P = 1 / 2 = 0.5 or 50%, and, (1 – P) = 0.5

Z = 1.96 for 95% confidence level, and, et = 3% or 0.03

5

 

Remember:

1. If confidence level is higher, the sample size will be high and vice-versa.

2. If error tolerance is lower, the sample size will be high and vice-versa.

3. As value or P increases from 0 to 0.5, the sample size increased; but when P increases from 0.5 to 1, the sample size reduces. This is quite obvious as well. For a brand most famous among consumers, even small sample is good enough, right? The maximum sample size will be at P = 0.5 value.

 

################# R Code ##############

My.Function2 <- function(){

Z.Score <- readline(“Please Enter the Value of Z-Score (e.g. 1.96, for CL 95%): “)

P.Value <- readline(“Please Enter the Value of Proportion P (between 0 and 1, e.g. 0.5): “)

E.T.Value <- readline(“How Much is Your Tolerence Level for Errors (for Zero Tolerence, between 1% to 99%, e.g. 0.03): “)

Z.Score <- as.numeric(unlist(strsplit(Z.Score, “,”)))

P.Value <- as.numeric(unlist(strsplit(P.Value, “,”)))

E.T.Value <- as.numeric(unlist(strsplit(E.T.Value, “,”)))

N.Value <- (P.Value * (1 – P.Value) * ((Z.Score/E.T.Value)^2))

NValue<- round(N.Value, digits = 0)

return(writeLines(c(“Required Sample Size is: “, NValue, “(Approximately)”), con = stdout(), sep = ” “, useBytes = FALSE))

}

####### The ouptut #########

#Running the function:

> My.Function2()
Please Enter the Value of Z-Score (e.g. 1.96, for CL 95%): 1.96

Please Enter the Value of Proportion P (between 0 and 1, e.g. 0.5): 0.5

How Much is Your Tolerance Level for Errors (for Zero Tolerance, between 1% to 99%, e.g. 0.03): 0.03

### Result ###

Required Sample Size is: 1067 (Approximately)

 

######### Modified Code ############

# This code can handle empty function

RSampleSize<- function(Z,P,E){

if(is.null(Z) || is.null(P) || is.null(E)) {
return()
} else {

Z.Score <- as.numeric(Z)
P.Value <- as.numeric(P)
E.T.Value <- as.numeric(E)

N.Value <- (P.Value * (1 – P.Value) * ((Z.Score/E.T.Value)^2))
NValue<- round(N.Value, digits = 0)

return(writeLines(c(“Required Sample Size is: “, NValue, “(Approximately)”), con = stdout(), sep = ” “, useBytes = FALSE))
}
}

Advertisements