Some terms.

- Research hypothesis: a prediction about relationships between variables.
- Null hypothesis: it states that any variation between the variables is due to random variability (also known as unsystematic variation).
- One-tailed hypothesis: a hypothesis that makes a prediction in one direction (hence one tail). For example, the relationship between A and B will be positive (directly proportional). Variability in the data will be significant only if it does in the predicted direction.
- Two-tailed hypothesis: a hypothesis that makes a prediction in two directions. Since the two-tailed hypothesis makes two predictions, levels of significance are stricter than in one-tailed hypothesis.
- Control condition: participants in this condition are not subjected to the experimental condition (no IV). So the tasks they do can be compared against the tasks done by participants in the experimental condition. The idea is that the only difference between the control condition and the experimental condition is the absence of the IV in the former and its presence in the latter.
- Related design or within-participants design or repeated-measures design: read about this here.
- Unrelated design or between-participants design or independent-measures design: read about this here.
- Parametric test: based on interval data (for example, temperatures).
- Non-parametric test: based on ordinal data where the order of values is important (for example, scores).

**Variability, p-value and levels of significance**

Experimental test results

Condition 1 scores: 23,424,2422,522,5224,2422. Condition 1 mean: 232 (fake).

Condition 2 scores: 46,424,232,493,2948. Condition 2 mean: 100 (fake).

How can we know whether the difference in mean scores in the two conditions are due to the IV manipulation? Does the difference between the means indicate *significant *support for the research hypothesis?

We can never be 100% sure that the differences among the means will not be due to random variability (unsystematic variation). What statistical tests do is calculate the probability of of the differencesÂ in the data being due to random variability (unsystematic variation). The idea is that if there is a low probability that the differences in the data are due to random variability, then we have a justification to claim that the differences in the data are likely to be due to the IV manipulation, in other words, we can reject the null hypothesis.

How low should the probability be before we can claim that the differences in the data are due to the IV manipulation? We know it can’t be 0%. In the psychological sciences, there is common agreement that the probability value (also called p-value) should be either below 1% (p < 0.01) or below 5% (p < 0.05). If the p-value is higher than any of the two values, we cannot reject the null hypothesis and thus, the variability in the data is not (statistically) significant.