1

The Framework of Hypothesis Testing

Hypothesis testing is a formal statistical method used to make decisions about a population based on sample data. It's one of the two main branches of statistical inference (the other being estimation).

Think of it like a court trial: the null hypothesis is "innocent until proven guilty." We assume it's true — and only reject it if the evidence (data) is strong enough.

The 7-Step Process of Hypothesis Testing

This structured approach ensures consistency and rigor:

  1. State the hypotheses — Define \( H_0 \) and \( H_a \).
  2. Choose the appropriate test statistic — e.g., z, t, chi-square, F.
  3. Specify the significance level (\( \alpha \)) — Common choices: 0.05 (5%), 0.01 (1%).
  4. State the decision rule — When will you reject \( H_0 \)?
  5. Collect data and calculate the test statistic — Use sample data.
  6. Make the statistical decision — Reject or fail to reject \( H_0 \).
  7. Make the economic or investment decision — What does this mean for portfolio strategy, risk, etc.?

Null vs. Alternative Hypotheses

  • Null Hypothesis (\( H_0 \)): The default assumption. It always contains an equality (=, ≤, or ≥). This is what we test and potentially reject.
  • Alternative Hypothesis (\( H_a \)): What we conclude if we reject \( H_0 \). It represents the researcher's belief or claim.

Example: Testing a Fund Manager's Claim

A fund manager claims their average return is at least 8% per year.

We want to test this claim.

  • \( H_0: \mu \geq 8\% \) (They're right)
  • \( H_a: \mu < 8\% \) (They're overstating)

This is a lower-tail test — we're checking if returns are significantly less than 8%.

Type I and Type II Errors

Because we use a sample, our decision might be wrong. There are two types of errors:

Decision If \( H_0 \) is True If \( H_0 \) is False
Do Not Reject \( H_0 \) ✅ Correct Decision Type II Error
(Fail to reject a false null)
Reject \( H_0 \) Type I Error
(Reject a true null)
✅ Correct Decision
  • Significance level (\( \alpha \)) = Probability of Type I error.
  • Power of the test = 1 – P(Type II error) = Probability of correctly rejecting a false null.
2

Making a Statistical Decision

Once hypotheses are set, we use sample data to decide whether to reject \( H_0 \). There are three main approaches — all lead to the same conclusion.

Test Statistic: The Core Formula

The test statistic measures how far the sample result is from the hypothesized value, in units of standard error.

\( \text{Test Statistic} = \frac{\text{Sample Statistic} - \text{Hypothesized Value}}{\text{Standard Error of the Statistic}} \)

This formula is used in z-tests, t-tests, and others.

Three Approaches to Decision Making

  • 1. Critical Value Approach:
    • Compare the test statistic to a critical value from a statistical table (z, t, etc.).
    • Reject \( H_0 \) if the test statistic falls in the rejection region.
  • 2. p-Value Approach:
    • The p-value is the smallest significance level at which you can reject \( H_0 \).
    • Decision Rule: If p-value < \( \alpha \), reject \( H_0 \).
    • Interpretation: A p-value of 0.03 means there's only a 3% chance of seeing this result if \( H_0 \) were true.
  • 3. Confidence Interval Approach:
    • A \( (1 - \alpha) \times 100\% \) confidence interval gives a range of plausible values for the population parameter.
    • If the hypothesized value under \( H_0 \) is outside this interval, reject \( H_0 \).

One-Tailed vs. Two-Tailed Tests

The form of \( H_a \) determines the type of test.

Two-Tailed Test

  • Used when we care about any difference from the hypothesized value.
  • Example: \( H_0: \mu = 8\% \), \( H_a: \mu \neq 8\% \)
  • Rejection Region: Split between both tails (e.g., ±1.96 for \( \alpha = 0.05 \)).

One-Tailed Test

  • Used when direction matters.
  • Upper Tail: \( H_0: \mu \leq \mu_0 \), \( H_a: \mu > \mu_0 \)
    Reject if test statistic > positive critical value.
  • Lower Tail: \( H_0: \mu \geq \mu_0 \), \( H_a: \mu < \mu_0 \)
    Reject if test statistic < negative critical value.

Example: One-Tailed z-Test

Test if average return > 6%. Sample: n = 50, \( \bar{x} = 7\% \), \( \sigma = 4\% \), \( \alpha = 0.05 \).

Step 1: \( H_0: \mu \leq 6\% \), \( H_a: \mu > 6\% \)

Step 2: Use z-test (large sample, known σ)

Step 3: Critical value = 1.645 (upper 5%)

Step 4: Calculate test statistic:

\( z = \frac{7 - 6}{4 / \sqrt{50}} = \frac{1}{0.566} \approx 1.77 \)

Step 5: 1.77 > 1.645 → Reject \( H_0 \)

Conclusion: Evidence suggests average return is greater than 6%.

3

Common Parametric Hypothesis Tests

Parametric tests make assumptions about the population distribution (usually normality) and test specific parameters like mean or variance.

Tests Concerning the Mean

  • Single Mean:
    • z-test: Use if population variance is known OR sample size ≥ 30 (CLT applies).
    • t-test: Use if population variance is unknown and n < 30. Degrees of freedom = \( n - 1 \).
  • Difference Between Two Means (Independent Samples):
    • Pooled t-test: When variances are unknown but assumed equal.
    • Approximate t-test (Welch's t): When variances are unknown and assumed unequal.
  • Mean of Differences (Paired Comparisons):
    • Used for "before vs. after" or matched pairs (e.g., same stock under two strategies).
    • Apply a t-test to the differences \( d_i = x_i - y_i \).

Tests Concerning Variance (Measuring Risk)

  • Single Variance: Use the chi-square (\( \chi^2 \)) test for a normal population.
    \( \chi^2 = \frac{(n - 1)s^2}{\sigma_0^2} \)

    Where:

    • \( s^2 \) = sample variance
    • \( \sigma_0^2 \) = hypothesized population variance
    • df = \( n - 1 \)

    The \( \chi^2 \) distribution is right-skewed and only takes positive values.

  • Equality of Two Variances: Use the F-test.
    \( F = \frac{s_1^2}{s_2^2} \)

    Where \( s_1^2 \geq s_2^2 \) (larger variance on top).

    • df1 = \( n_1 - 1 \), df2 = \( n_2 - 1 \)
    • F-distribution is right-skewed.

Test for Correlation

To test if the correlation \( \rho \) between two variables is significantly different from zero:

  • \( H_0: \rho = 0 \), \( H_a: \rho \neq 0 \)
  • Use a t-test with:
    \( t = \frac{r \sqrt{n - 2}}{\sqrt{1 - r^2}} \)

    Where \( r \) = sample correlation, df = \( n - 2 \)

4

Nonparametric Tests: When Assumptions Fail

Nonparametric tests make fewer assumptions about the population. They are used when:

  • Data is not normally distributed
  • Sample size is small
  • Data is ordinal or ranked

They are less powerful than parametric tests but more robust.

Common Nonparametric Tests

Question Parametric Test Nonparametric Alternative
Is a single mean significantly different from a value? t-test / z-test Wilcoxon signed-rank test
Are means of two independent groups different? t-test / approx. t-test Mann-Whitney U test
Are paired observations different? Paired t-test Wilcoxon signed-rank test, Sign test

Spearman Rank Correlation Coefficient

When the assumptions for Pearson correlation (normality, linearity) aren't met, use the Spearman rank correlation. It measures monotonic (not necessarily linear) relationships.

Steps:

  1. Rank the values of each variable separately.
  2. Calculate the difference \( d_i \) between ranks for each pair.
  3. Apply the formula:
\( r_S = 1 - \frac{6 \sum d_i^2}{n(n^2 - 1)} \)

Interpretation: Same as Pearson — ranges from -1 to +1.

Example: Spearman Correlation

Two analysts rank 5 stocks by attractiveness:

  • Analyst A: 1, 2, 3, 4, 5
  • Analyst B: 2, 1, 4, 3, 5

Differences in ranks: \( d_i = [-1, 1, -1, 1, 0] \)

\( \sum d_i^2 = 1 + 1 + 1 + 1 + 0 = 4 \)

\( r_S = 1 - \frac{6 \times 4}{5(25 - 1)} = 1 - \frac{24}{120} = 1 - 0.2 = 0.8 \)

Strong positive rank correlation — analysts agree on general order.

Progress:
Chapter 8 of 11