The Framework of Hypothesis Testing
Hypothesis testing is a formal statistical method used to make decisions about a population based on sample data. It's one of the two main branches of statistical inference (the other being estimation).
Think of it like a court trial: the null hypothesis is "innocent until proven guilty." We assume it's true — and only reject it if the evidence (data) is strong enough.
The 7-Step Process of Hypothesis Testing
This structured approach ensures consistency and rigor:
- State the hypotheses — Define \( H_0 \) and \( H_a \).
- Choose the appropriate test statistic — e.g., z, t, chi-square, F.
- Specify the significance level (\( \alpha \)) — Common choices: 0.05 (5%), 0.01 (1%).
- State the decision rule — When will you reject \( H_0 \)?
- Collect data and calculate the test statistic — Use sample data.
- Make the statistical decision — Reject or fail to reject \( H_0 \).
- Make the economic or investment decision — What does this mean for portfolio strategy, risk, etc.?
Null vs. Alternative Hypotheses
- Null Hypothesis (\( H_0 \)): The default assumption. It always contains an equality (=, ≤, or ≥). This is what we test and potentially reject.
- Alternative Hypothesis (\( H_a \)): What we conclude if we reject \( H_0 \). It represents the researcher's belief or claim.
Example: Testing a Fund Manager's Claim
A fund manager claims their average return is at least 8% per year.
We want to test this claim.
- \( H_0: \mu \geq 8\% \) (They're right)
- \( H_a: \mu < 8\% \) (They're overstating)
This is a lower-tail test — we're checking if returns are significantly less than 8%.
Type I and Type II Errors
Because we use a sample, our decision might be wrong. There are two types of errors:
| Decision | If \( H_0 \) is True | If \( H_0 \) is False |
|---|---|---|
| Do Not Reject \( H_0 \) | ✅ Correct Decision | ❌ Type II Error (Fail to reject a false null) |
| Reject \( H_0 \) | ❌ Type I Error (Reject a true null) |
✅ Correct Decision |
- Significance level (\( \alpha \)) = Probability of Type I error.
- Power of the test = 1 – P(Type II error) = Probability of correctly rejecting a false null.
Making a Statistical Decision
Once hypotheses are set, we use sample data to decide whether to reject \( H_0 \). There are three main approaches — all lead to the same conclusion.
Test Statistic: The Core Formula
The test statistic measures how far the sample result is from the hypothesized value, in units of standard error.
This formula is used in z-tests, t-tests, and others.
Three Approaches to Decision Making
- 1. Critical Value Approach:
- Compare the test statistic to a critical value from a statistical table (z, t, etc.).
- Reject \( H_0 \) if the test statistic falls in the rejection region.
- 2. p-Value Approach:
- The p-value is the smallest significance level at which you can reject \( H_0 \).
- Decision Rule: If p-value < \( \alpha \), reject \( H_0 \).
- Interpretation: A p-value of 0.03 means there's only a 3% chance of seeing this result if \( H_0 \) were true.
- 3. Confidence Interval Approach:
- A \( (1 - \alpha) \times 100\% \) confidence interval gives a range of plausible values for the population parameter.
- If the hypothesized value under \( H_0 \) is outside this interval, reject \( H_0 \).
One-Tailed vs. Two-Tailed Tests
The form of \( H_a \) determines the type of test.
Two-Tailed Test
- Used when we care about any difference from the hypothesized value.
- Example: \( H_0: \mu = 8\% \), \( H_a: \mu \neq 8\% \)
- Rejection Region: Split between both tails (e.g., ±1.96 for \( \alpha = 0.05 \)).
One-Tailed Test
- Used when direction matters.
- Upper Tail: \( H_0: \mu \leq \mu_0 \), \( H_a: \mu > \mu_0 \)
Reject if test statistic > positive critical value. - Lower Tail: \( H_0: \mu \geq \mu_0 \), \( H_a: \mu < \mu_0 \)
Reject if test statistic < negative critical value.
Example: One-Tailed z-Test
Test if average return > 6%. Sample: n = 50, \( \bar{x} = 7\% \), \( \sigma = 4\% \), \( \alpha = 0.05 \).
Step 1: \( H_0: \mu \leq 6\% \), \( H_a: \mu > 6\% \)
Step 2: Use z-test (large sample, known σ)
Step 3: Critical value = 1.645 (upper 5%)
Step 4: Calculate test statistic:
Step 5: 1.77 > 1.645 → Reject \( H_0 \)
Conclusion: Evidence suggests average return is greater than 6%.
Common Parametric Hypothesis Tests
Parametric tests make assumptions about the population distribution (usually normality) and test specific parameters like mean or variance.
Tests Concerning the Mean
- Single Mean:
- z-test: Use if population variance is known OR sample size ≥ 30 (CLT applies).
- t-test: Use if population variance is unknown and n < 30. Degrees of freedom = \( n - 1 \).
- Difference Between Two Means (Independent Samples):
- Pooled t-test: When variances are unknown but assumed equal.
- Approximate t-test (Welch's t): When variances are unknown and assumed unequal.
- Mean of Differences (Paired Comparisons):
- Used for "before vs. after" or matched pairs (e.g., same stock under two strategies).
- Apply a t-test to the differences \( d_i = x_i - y_i \).
Tests Concerning Variance (Measuring Risk)
- Single Variance: Use the chi-square (\( \chi^2 \)) test for a normal population.
\( \chi^2 = \frac{(n - 1)s^2}{\sigma_0^2} \)
Where:
- \( s^2 \) = sample variance
- \( \sigma_0^2 \) = hypothesized population variance
- df = \( n - 1 \)
The \( \chi^2 \) distribution is right-skewed and only takes positive values.
- Equality of Two Variances: Use the F-test.
\( F = \frac{s_1^2}{s_2^2} \)
Where \( s_1^2 \geq s_2^2 \) (larger variance on top).
- df1 = \( n_1 - 1 \), df2 = \( n_2 - 1 \)
- F-distribution is right-skewed.
Test for Correlation
To test if the correlation \( \rho \) between two variables is significantly different from zero:
- \( H_0: \rho = 0 \), \( H_a: \rho \neq 0 \)
- Use a t-test with:
\( t = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^2}} \)
Where \( r \) = sample correlation, df = \( n - 2 \)
Nonparametric Tests: When Assumptions Fail
Nonparametric tests make fewer assumptions about the population. They are used when:
- Data is not normally distributed
- Sample size is small
- Data is ordinal or ranked
They are less powerful than parametric tests but more robust.
Common Nonparametric Tests
| Question | Parametric Test | Nonparametric Alternative |
|---|---|---|
| Is a single mean significantly different from a value? | t-test / z-test | Wilcoxon signed-rank test |
| Are means of two independent groups different? | t-test / approx. t-test | Mann-Whitney U test |
| Are paired observations different? | Paired t-test | Wilcoxon signed-rank test, Sign test |
Spearman Rank Correlation Coefficient
When the assumptions for Pearson correlation (normality, linearity) aren't met, use the Spearman rank correlation. It measures monotonic (not necessarily linear) relationships.
Steps:
- Rank the values of each variable separately.
- Calculate the difference \( d_i \) between ranks for each pair.
- Apply the formula:
Interpretation: Same as Pearson — ranges from -1 to +1.
Example: Spearman Correlation
Two analysts rank 5 stocks by attractiveness:
- Analyst A: 1, 2, 3, 4, 5
- Analyst B: 2, 1, 4, 3, 5
Differences in ranks: \( d_i = [-1, 1, -1, 1, 0] \)
\( \sum d_i^2 = 1 + 1 + 1 + 1 + 0 = 4 \)
Strong positive rank correlation — analysts agree on general order.