Chapter 9

Tests of Independence

Learn how to test whether two categorical variables are independent using chi-square tests, understand contingency tables, and interpret results for investment decision-making.

1

Introduction to Tests of Independence

Tests of independence are statistical methods used to determine whether two categorical variables are related to each other. In finance and investment analysis, these tests help us understand relationships between different qualitative factors that might affect investment decisions.

Key Concepts

  • Independence: Two variables are independent if the occurrence of one does not affect the probability of the other
  • Categorical Variables: Variables that can be divided into distinct categories or groups
  • Association: A relationship between two variables where they tend to occur together in predictable patterns
Important Note

Tests of independence help investors understand whether factors like company size, sector, or credit rating are related to investment outcomes such as default rates or performance categories.

2

Chi-Square Test of Independence

The chi-square test of independence is the most commonly used test for determining whether two categorical variables are independent. It compares observed frequencies with expected frequencies under the assumption of independence.

Hypotheses

  • Null Hypothesis (H₀): The two variables are independent
  • Alternative Hypothesis (H₁): The two variables are not independent (they are associated)

When to Use Chi-Square Test

  • Both variables are categorical
  • Data consists of frequencies or counts
  • Sample size is sufficiently large
  • Expected frequencies in each cell are at least 5
3

Contingency Tables

A contingency table (or cross-tabulation) displays the frequency distribution of observations for two categorical variables. It forms the basis for chi-square tests.

Structure of a Contingency Table

Variable A \ Variable B Category B₁ Category B₂ Category B₃ Row Total
Category A₁ O₁₁ O₁₂ O₁₃ R₁
Category A₂ O₂₁ O₂₂ O₂₃ R₂
Column Total C₁ C₂ C₃ N

Where:

  • Oᵢⱼ = Observed frequency in cell (i,j)
  • Rᵢ = Row total for row i
  • Cⱼ = Column total for column j
  • N = Grand total (total sample size)

Expected Frequencies

Under the assumption of independence, the expected frequency for each cell is calculated as:

$$E_{ij} = \frac{R_i \times C_j}{N}$$
Investment Example

Suppose we want to test whether company size (Small, Medium, Large) is independent of investment performance (Poor, Average, Good). We collect data from 300 companies:

Company Size Poor Performance Average Performance Good Performance Total
Small 45 55 50 150
Medium 25 35 40 100
Large 10 20 20 50
Total 80 110 110 300
4

Chi-Square Test Statistic

The chi-square test statistic measures the difference between observed and expected frequencies. It follows a chi-square distribution under the null hypothesis of independence.

Formula for Chi-Square Statistic

$$\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$$

Where:

  • χ² = Chi-square test statistic
  • Oᵢⱼ = Observed frequency in cell (i,j)
  • Eᵢⱼ = Expected frequency in cell (i,j)
  • r = Number of rows
  • c = Number of columns

Properties of Chi-Square Statistic

  • Always non-negative (χ² ≥ 0)
  • Large values indicate greater deviation from independence
  • Small values suggest the variables might be independent
  • Follows chi-square distribution with appropriate degrees of freedom
Calculating Expected Frequencies

Using our investment example, let's calculate expected frequencies:

For Small companies with Poor performance:

$$E_{11} = \frac{150 \times 80}{300} = 40$$

For Medium companies with Average performance:

$$E_{22} = \frac{100 \times 110}{300} = 36.67$$

Complete expected frequencies table:

Company Size Poor Average Good
Small 40.00 55.00 55.00
Medium 26.67 36.67 36.67
Large 13.33 18.33 18.33
5

Degrees of Freedom

The degrees of freedom for a chi-square test of independence depend on the size of the contingency table. This determines which chi-square distribution to use for finding critical values and p-values.

Formula for Degrees of Freedom

$$df = (r - 1) \times (c - 1)$$

Where:

  • df = degrees of freedom
  • r = number of rows in the contingency table
  • c = number of columns in the contingency table

Understanding Degrees of Freedom

Degrees of freedom represent the number of values that can vary freely when calculating the test statistic, given the constraint that row and column totals are fixed.

Degrees of Freedom Examples
  • 2×2 table: df = (2-1) × (2-1) = 1
  • 3×3 table: df = (3-1) × (3-1) = 4
  • 3×4 table: df = (3-1) × (4-1) = 6
  • Our investment example (3×3): df = (3-1) × (3-1) = 4
Important Consideration

As the degrees of freedom increase, the chi-square distribution becomes more symmetric and approaches a normal distribution. This affects the critical values and the shape of the rejection region.

6

Critical Values and Decision Rule

The decision to reject or fail to reject the null hypothesis is based on comparing the calculated chi-square statistic with the critical value from the chi-square distribution.

Decision Rule

  • If χ² > χ²critical: Reject H₀ (variables are not independent)
  • If χ² ≤ χ²critical: Fail to reject H₀ (variables may be independent)

Common Critical Values

Degrees of Freedom α = 0.10 α = 0.05 α = 0.01
1 2.706 3.841 6.635
2 4.605 5.991 9.210
3 6.251 7.815 11.345
4 7.779 9.488 13.277
5 9.236 11.071 15.086

P-value Approach

Alternatively, you can calculate the p-value and compare it with the significance level:

  • If p-value < α: Reject H₀
  • If p-value ≥ α: Fail to reject H₀
7

Practical Examples

Example 1: Company Size and Performance

Let's complete our investment example by calculating the chi-square statistic:

Step 1: Calculate chi-square statistic

$$\chi^2 = \frac{(45-40)^2}{40} + \frac{(55-55)^2}{55} + \frac{(50-55)^2}{55} + \frac{(25-26.67)^2}{26.67} + \frac{(35-36.67)^2}{36.67} + \frac{(40-36.67)^2}{36.67} + \frac{(10-13.33)^2}{13.33} + \frac{(20-18.33)^2}{18.33} + \frac{(20-18.33)^2}{18.33}$$
$$\chi^2 = 0.625 + 0 + 0.455 + 0.104 + 0.076 + 0.303 + 0.833 + 0.152 + 0.152 = 2.70$$

Step 2: Compare with critical value

With df = 4 and α = 0.05, χ²critical = 9.488

Since 2.70 < 9.488, we fail to reject H₀

Conclusion: There is insufficient evidence to conclude that company size and performance are associated.

Example 2: Investment Sector and Risk Rating

An investment firm wants to test whether investment sector is independent of risk rating. Data from 200 investments:

Sector Low Risk Medium Risk High Risk Total
Technology 15 25 35 75
Healthcare 20 30 25 75
Financial 25 15 10 50
Total 60 70 70 200

Solution:

1. Calculate expected frequencies for each cell

2. Compute χ² = 16.67 (detailed calculation omitted for brevity)

3. df = (3-1) × (3-1) = 4

4. At α = 0.05, χ²critical = 9.488

5. Since 16.67 > 9.488, reject H₀

Conclusion: Investment sector and risk rating are not independent.

8

Interpretation of Results

Proper interpretation of chi-square test results is crucial for making informed investment decisions and understanding relationships between categorical variables.

When Variables are Independent

  • The occurrence of one category does not affect the probability of another
  • Observed frequencies are close to expected frequencies
  • Chi-square statistic is small
  • Investment implications: Factors can be considered separately in analysis

When Variables are Associated

  • Certain combinations occur more or less frequently than expected
  • Patterns in the data suggest relationships
  • Chi-square statistic is large
  • Investment implications: Factors should be considered together

Practical Implications for Investors

Investment Applications
  • Portfolio Construction: Understanding which factors are independent helps in diversification
  • Risk Assessment: Associated variables may indicate correlated risks
  • Performance Analysis: Identifying factors that influence returns
  • Due Diligence: Testing relationships between company characteristics and outcomes

Effect Size and Practical Significance

Statistical significance doesn't always imply practical importance. Consider:

  • Cramér's V for measuring association strength
  • Residual analysis to identify which cells contribute most to the chi-square statistic
  • Business context and economic significance
$$\text{Cramér's V} = \sqrt{\frac{\chi^2}{N \times \min(r-1, c-1)}}$$
9

Assumptions and Limitations

Chi-square tests have several important assumptions that must be met for valid results. Understanding these limitations is crucial for proper application in financial analysis.

Key Assumptions

  • Independence of Observations: Each observation must be independent of others
  • Expected Frequency Requirement: Expected frequency in each cell should be at least 5
  • Mutually Exclusive Categories: Each observation belongs to exactly one category for each variable
  • Random Sampling: Data should be collected through random sampling

When Assumptions are Violated

Violation Consequence Solution
Small expected frequencies Test statistic distribution is inaccurate Fisher's exact test or combine categories
Non-independent observations Inflated Type I error rate Use appropriate clustering methods
Non-random sampling Results may not be generalizable Acknowledge limitations in interpretation

Limitations in Financial Applications

  • Market Dependencies: Financial data often exhibits temporal dependencies
  • Sample Size Issues: Small samples in specialized investment categories
  • Category Definition: Subjective categorization of continuous variables
  • Time-Varying Relationships: Relationships may change over different market conditions
Best Practices
  • Always check expected frequencies before conducting the test
  • Consider the business context when interpreting results
  • Use complementary analysis methods when possible
  • Be cautious about causal interpretations
10

Chapter Summary

Key Learning Points

  • Tests of Independence: Statistical methods to determine if categorical variables are related
  • Chi-Square Test: Most common test using observed vs. expected frequencies
  • Contingency Tables: Cross-tabulations that organize categorical data
  • Test Statistic: χ² = Σ[(O-E)²/E] follows chi-square distribution
  • Degrees of Freedom: df = (r-1) × (c-1) determines critical values

Investment Applications

  • Testing relationships between company characteristics and performance
  • Analyzing associations between market sectors and risk levels
  • Understanding dependencies in portfolio construction
  • Validating assumptions about factor independence

Critical Formulas

$$E_{ij} = \frac{R_i \times C_j}{N}$$

Expected Frequency

$$\chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(O_{ij} - E_{ij})^2}{E_{ij}}$$

Chi-Square Test Statistic

$$df = (r - 1) \times (c - 1)$$

Degrees of Freedom

Next Steps

In the next chapter, we'll explore Simple Linear Regression, which allows us to model relationships between continuous variables and make predictions based on linear associations.