Sampling distributions and confidence intervals: standard error

Sampling distributions

The mean of a sample is a relatively good estimate of the mean of the population. More specifically, the mean of a large sample will be a better estimate of the population mean than the mean of a smaller sample. So, the larger the sample, the better the estimate of the population mean. The plot of the sample statistics (a mean is an example of a statistic) of our samples (reminder: it is of our samples not of our populations) as a histogram is called sampling distribution. So plotting the sample means of our samples gives us the sampling distribution of the mean. Click here to see how to compute the sampling distribution of the mean.

One point to note is that the sampling distribution of the mean will be normally distributed regardless of the shape of the population distribution.

 

Confidence intervals

It has been said in the beginning that the mean of a sample is a relatively good estimate of the mean of the population. However, we don’t how good a estimate it is. That is why we use confidence intervals. Confidence intervals are interval estimates of where the population mean might be and we give a probability X to the likehood that the population mean might fall inside the intervals.

So if we gave a sample of 5 people a dice to throw (side 1 is 1 point, side 2 is 2 points, etc) and the sample mean score was 3. Since we know that the minimum score is 1 and the maximum is 6, we are 100% confident that the population mean lies between 1 and 6. This is an example of a confidence interval. However, it is not too broad to be helpful. The usefulness of a CI increases as the CI narrows down. So let us say that for a 95% CI, the interval is 2 to 4. It is important to note that all this means is that if we performed repeatedly random sampling of the population, 95% of the sample means would fall somewhere inside the 2-4 interval.

We know that the normal distribution can be computed from the mean and the standard deviation. We also know that a sampling distribution of the population mean is a normal distribution. Thus, we know that the sampling distribution of the population mean can be computed from the population mean and the population standard deviation.

Normally, we don’t have access to the population mean but we can get samples from the population. Since we do not know the population mean, we don’t know where our sample means stand in relation to the population mean (they might fall above, below or be equal to the population mean). So the question is:

how do we know how close the population mean is to the sample mean?

We know that the sampling distribution is normally distributed and its mean is a relatively good approximation to the population mean. The modus operandi to answer the above question goes as follows.

  1. We plot a normal distribution known to be a relatively good approximation to the population mean
  2. We use what we know about normal distributions to estimate how far the sample mean is from the population mean

See the probabilities of the SND below. We can see that there is a 96% probability that the mean will fall in the -2 to 2 z-score range.

Probabilities in the SND

Probabilities in the SND

Normally, we are interested in 95% CI. Checking a table of z-scores allows to know that 95% of the scores in a normal distribution fall within -1.96 and 1.96 standard deviations. This means that we can be 95% confident that the sample mean will fall somewhere in the -1.96 to 1.96 z-score range. So if the population mean was somewhere below the sample mean, we could be 95% confident that the population mean was 1.96 standard deviations of the population below the sample mean.

Knowing all the above, we can estimate how far a sample mean lies from the population mean knowing:

  1. the sample mean
  2. population standard deviation (we do not know this but we know that the standard deviation of the sampling distribution of the mean is a relatively good approximation to the population mean)

So how do we estimate the population SD?

 

Standard error

With estimate the population SD with the standard error. The standard deviation of the sampling distribution is called the standard error. It measures the degree to which the individual sample means deviate from the mean of the sample means. Before we go on, it is important to note some things. The means of small samples have high SD (so they deviate a lot from the population mean) while the means of big samples have low SD (they don’t deviate a lot from the population mean). Since the standard error is the SD of the sampling distribution, the standard error from smaller samples will be higher than the standard error of bigger smaller samples. So a rule of thumb is that low standard errors are more desirable than low standard errors. The standard error for a particular sample can be calculated as follows: (sample SD)/(square root of sample size).

 

Computing the CI

Once we have the standard error, for a 95% CI, we can estimate the interval inside which population mean might fall as follows: for the lower range: –1.96 * standard error and for the upper range: 1.96 * standard error. So the 95% CI would be a range between a lower value and an upper value in the following fashion: (lower value, upper value) and more especifically: (sample mean + (1.96 * standard error), sample mean +(1.96 * standard error)).

For smaller sample sizes, we will get broader CIs, while for bigger CIs we will get narrow CIs. The narrower the CI, the more useful it is.

 

Z-scores: What and Why

Why

When we want to standardise scores and analysing them using the standard normal distribution, we need to convert our scores into standard ones. Z-scores allows us to compare two distributions by placing them on the same scale. We do this by transforming our scores into z-scores.

 

What

The z-score of a score is obtained by doing the following: (score – mean)/ standard deviation. A z-score is expressed in terms of standard deviations and tells us how many standard deviations below/above the mean our score is. So a z-score of 2.5 means that our score is 2.5 standard deviations above the mean, while a z-score of -1.5 means that our score is 1.5 standard deviations below the mean.

 

Z-scores and the standard normal distribution

A SND is a probability distribution because every score of the distribution has a probability. This refers to the probability of randomly selecting that score. Just like every score has a probability associated with it so do score ranges. So there is a 68% probability of getting a z-score between -1 and 1.

Probabilities in the SND

Probabilities in the SND

 

Similarly, there is a 96% probability of getting a z-score between -2 and 2.

 

Statistical Inference and the Central Limit Theorem

A population parameter is the value of a factor in the target population. While a sample statistic is any summary measure from your collected data such as mean, correlation coefficients, ratios between means, etc.

There is a way to determine the distribution of any given statistic through a computer simulation. An example of a computer simulation to determine the number of heads in 40000 100-coin-tosses would go as follows:

Step 0: Repeat Steps from 1 to 3 40000 times

Step 1: Take a coin and randomly flip it 100 times

Step 2: Count the number of heads

Step 3: Store the number

The plot would be likely to be normally distributed with the mean oscillating between 40 and 60 heads during a 100 coin-toss.

The Central Limit Theorem states that if all possible random samples of the same size S are taken from a population with a given mean and a particular standard deviation, the sampling distribution will have a mean equal to the population mean and the standard deviation of the sampling distribution will be the population SD divided by the square root of S. The CLT also states that the sampling distribution will be approximately normally distributed.

Summary of Hypothesis Testing(enhanced)

  • Define null and alternative hypotheses
  • Compute a null distribution (this would show us what would happen if we randomly collected data and the null hypothesis was true)
  • Collect empirical data
  • Calculate the p value of the empirical data
  • Reject or accept the hypothesis

Inferential Statistics vs Descriptive Statistics vs EDA

There are two main uses of inferential statistics. The first one is to carry out accurate inferences about a population based on a sample taken from that population. The second one is to calculate the probability of whether some perceived difference in several groups or conditions might have occurred by chance.

On the other hand, descriptive statistics provides with a summary of a sample. The summary is comprised of measures of central tendency and measure of variability, among others. Unlike inferential statistics, descriptive statistics does not rely on probability theory.

Exploratory data analysis “explores” the data looking for patterns and outliers. Among its goals, it makes hypothesis about the causes of patterns and provides suggestions for further data collection.

Multiple regression

If in linear regression we had a predictor X and the criterion (predicted variable) Y. In multiple regression, we have several predictors (X1, X2, X3, etc) and a single criterion (predicted variable) Y.

The goal is to investigate the extent to which the predictor variables can predict the criterion variable Y. This goal is divided in 3 sub-goals.

Goal 1

To assess the extent to which the predictor scores are associated with the criterion variable scores. We need to know about the relationship of every predictor to the criterion. Thus we apply the Pearson’s correlation test to every one of the relationships. The result is a matrix of correlations described by the Pearson’s coefficient. The ideal result is to find significant correlations between the predictor scores and the criterion variable scores. The sum of all the correlations is called multiple R.

Goal 2

To assess the (statistical) significance of the variance on the criterion variable produced by the predictors (i.e. to assess the significance of the predicted variance). In other words, we want to see if there is enough predicted variance compared to unpredicted variance. The square multiple R also called R-square is used here because it is a measure of the predicted multiple regression that can be explained by the predictors.

So the significance of the predicted variance is calculated as we have seen before in linear regression and the ANOVA tests. So we get a ratio of predicted variance and error variance.

Goal 3

To assess the variance on the criterion variable produced by individual predictors (i.e. to check the extent of the contribution of each individual predictor to the predicted variance on the criterion variable).

In order to check the contribution of each predictor, we need to put the predictors on the same scale so they can be measure on the same scale. Another way of describing this is saying that we need to standardise the predictor scores.

Confidence intervals and effect sizes

Confidence intervals

There are two types of confidence intervals. One related to the mean of an interval variable and the other related to the percentage of a categorical variable (a nominal variable or an ordinal variable). Here only the former type will be covered.

Confidence intervals for means

The goal is to capture the true value/effect within some intervals. When repeatedly taking equally-sized random samples of a population, the mean of the samples will tend to get closer and closer to the population mean. The standard deviation of these sample means is called standard error. From the mathematical properties of the normal distribution, we know that about 68% of the sample means will fall within 1 standard error of the population mean. Thus, given a random sample of the population, there is a 68% probability that the population mean will be found within 1 standard error of the of the sample mean.

And similarly, from the mathematical properties of the normal distribution, we know that about 95% of the sample means will fall within 2 standard error of the population mean. Thus, given a random sample of the population, there is a 95% probability that the population mean will be found within 2 standard error of the of the sample mean.

These probabilities of the ranges where the population mean will fall are called confidence intervals. They show the probability of a margin related to the mean of a sample. Confidence intervals can be visualised with error bars. Summary: confidence intervals give us the estimated range of values for a population parameter, a precision estimate (indicated by the width of the confidence interval) and a statistical significance (if the confidence of interval does not cover the null value, it is significant at the 0.05 level – the null value is the value of a factor in the sample that is strongly thought not to exist in the population, such as a ratio of 1).

 

Effect sizes

The effect sizes tell us how strong/large the difference/relationship between the relevant variables is. Even if you find a highly statistically significant in the difference/relationship between some variables, if the difference/relationship is weak, it is not relevant/valid/meaningful. Normally, measures of effect size like the Pearson’s coefficient take values (ignoring the positive/negative signs) between 1 (strong effect) and 0 (no effect).

Statistical significance, Statistical power and Type I and Type II errors

  • Statistical significance: As its name suggests, statistical significance refers to the probability that an observed difference in the sample is not due to chance. It forms part of the hypothesis testing method mentioned earlier. Normally, the statistical significance is set at 5% or 1% and is expressed as p < 0.05 or p < 0.01. This means that in order to reject the possibility than an event is not due to chance, the probability of that event needs to be below 0.05 or 0.01.
  • Type I error: when a test claims a significant difference in the sample with p = 0.023, we cannot know whether the difference found in the sample reflects a difference that exists in the population the sample was taken from. There exists a 2.3% probability that the difference in the sample is due to chance. Whenever it is claimed that a difference exists in the population when it does not exist, we call that claim a Type I error. Type I errors can be managed by setting appropriate significance levels (α = 0.05 or α = 0.01). The significance levels are symbolised by the character “α”. So by setting a significance level of 5%, we set the risk of commiting a Type I error at 5% or less. The lower the significance level, the lower the risk of a Type I error. On the other hand, when dealing with small sample sizes, it might be a good idea to raise the significance level. Worth noting that setting up the significance level is a subjective criteria that depends on the goal of the researcher. Type I errors are one of the risks taken when raising the significance level.
  • Type II error: it is the opposite of a Type I error. Whenever it is claimed that a difference does not exist in the population when it does exist, we call that claim a Type II error. Type II errors are one of the risks taken when lowering the significance level. You might find a significant difference in your sample that happens to reflect a real difference in the population with p = 0.034 but since you had α = 0.01, you conclude that the difference does not reflect a real difference in the population.

So by raising the significance level too much, you risk a Type I error and by lowering it too much, you risk a Type II error.

  • Statistical power: it is the ability to detect a significant difference in a sample if it exists in the population. So a statistical power of 0.40 means that there is a 40% probability that the test will find a difference in a sample if it exists in the population. It also means that there is a 60% probability that the test won’t find a difference even if it exists. In other words, there is a 60% probability of a Type II error. The statistical power of a given test is 1- β where “β” or beta refers to the probability of committing a Type II error. As a rule of thumb, the statistical power of a given test should be at least 0.8, thus having a 20% probability of committing a Type II error. In order to calculate the statistical power of a test, we need the sample size the test will be applied to, the α value and the effect size (the size of the difference/relationship detected).