Measures of Central Tendency: Finding the Center
Measures of central tendency give us a single value that describes the center or typical value of a dataset. In finance, this helps us understand the average or expected return of an asset.
The Arithmetic Mean
This is the most common measure of the center, calculated by summing all values and dividing by the count of values.
- A key property is that the sum of the deviations of each observation from the mean is always zero.
- Advantages: It uses all the data and is easy to compute.
- Limitations: It is highly sensitive to extreme values, known as outliers.
The Median
The median is the middle value in a sorted dataset. For an even number of observations, it's the average of the two middle values.
- Its main advantage is that it is not affected by extreme values.
- Limitations include being more time-consuming to calculate for large datasets and not considering the magnitude of all observations.
The Mode
The mode is the value that appears most frequently in a dataset.
- A dataset can have one mode (unimodal), two modes (bimodal), or three modes (trimodal).
- It is particularly useful for nominal data (data that is categorical).
Other Concepts of Mean
Depending on the situation, different types of means are more appropriate.
- Weighted Mean: Assigns different weights (importance) to each observation. This is used extensively in portfolio returns, where each asset has a different weight.
Weighted Mean = Sum of (weight × value) for all observations
- Geometric Mean: Used to compute the average compound return over multiple periods.
Geometric Mean = [(1+R₁)(1+R₂)...(1+Rₙ)]^(1/ₙ) - 1
- Harmonic Mean: Used in strategies like cost averaging, where an equal dollar amount is invested at different times.
Harmonic Mean = n / (Sum of 1/value for all values)
- Trimmed Mean: To handle outliers, this mean is calculated after excluding a small percentage of the highest and lowest values.
- Winsorized Mean: This method adjusts extreme values by replacing them with a certain percentile value, rather than deleting them.
Measures of Location: Quantiles
Quantiles divide a dataset into equal-sized parts, helping us understand the location of a specific data point within the distribution.
- The location of any quantile can be found with the formula: Position = (n+1) × (percentile/100), where 'n' is the number of observations.
- Quartiles: Divide the data into 4 equal parts. The Interquartile Range (IQR) is the difference between the third and first quartiles, representing the middle 50% of the data and serving as a measure of dispersion.
- Quintiles and Deciles: Divide data into 5 and 10 equal parts, respectively.
- Percentiles: Divide data into 100 equal parts for very detailed analysis.
Measures of Dispersion: Gauging Risk
Dispersion measures the variability of data around the central tendency. In finance, dispersion is a primary indicator of risk.
- Range: The simplest measure, calculated as Maximum Value - Minimum Value. It's easy to compute but ignores the shape of the distribution.
- Mean Absolute Deviation (MAD): The average of the absolute deviations from the mean.
MAD = (Sum of absolute deviations from mean) / n
- Sample Variance and Standard Deviation: These are the most common measures of risk. Variance is the average of the squared deviations from the mean. Standard deviation is the square root of the variance.
Sample Variance = Sum of squared deviations from mean / (n-1)
Sample Standard Deviation = Square root of variance - Downside Deviation (Target Semi-deviation): A risk measure that focuses only on returns that fall below a minimum acceptable threshold or target. This is often more relevant to investors who are primarily concerned with losses.
Downside Deviation = Square root of [Sum of squared deviations below target / (n-1)]
- Coefficient of Variation (CV): Measures the amount of risk (standard deviation) per unit of mean return. It is a relative measure, useful for comparing the risk of assets with different expected returns. A higher CV indicates higher risk per unit of return.
Coefficient of Variation = Standard Deviation / Mean
Measures of Shape: Skewness and Kurtosis
These measures describe the shape of a data distribution compared to a perfect "bell curve" (normal distribution).
Skewness: Is the Distribution Symmetrical?
Skewness describes the degree of asymmetry in a distribution.
- Symmetrical (Normal) Distribution: Identical on both sides of the mean. For this distribution, Mean = Median = Mode.
- Positively Skewed (Skewed Right): Has a long tail on the right side, with more outliers in the upper region. In this case, Mean > Median > Mode. The mean is pulled in the direction of the long tail.
- Negatively Skewed (Skewed Left): Has a long tail on the left side, with more outliers in the lower region. In this case, Mean < Median < Mode.
Kurtosis: Are the Tails Fat or Thin?
Kurtosis measures how peaked a distribution is and the thickness of its tails compared to a normal distribution.
- A normal distribution has a kurtosis of 3.
- Excess Kurtosis = Sample Kurtosis - 3. This is typically what analysts refer to.
- Leptokurtic (Positive Excess Kurtosis > 0): More peaked with "fatter tails" than a normal distribution. This means there is a higher frequency of extremely large deviations from the mean, indicating higher risk.
- Mesokurtic (Excess Kurtosis = 0): Has the same kurtosis as a normal distribution.
- Platykurtic (Negative Excess Kurtosis < 0): Less peaked with "thinner tails" than a normal distribution.
Correlation Between Two Variables
Correlation measures the linear relationship between two variables, indicating both the direction and strength of their association.
- Covariance: Measures how two variables move together. Its calculation is the first step towards finding correlation.
- Correlation Coefficient (r): A standardized measure of the linear association between two variables.
Correlation Coefficient = Covariance(X,Y) / (Standard Deviation(X) × Standard Deviation(Y))
Properties of Correlation
- The correlation coefficient ranges from -1 to +1.
- A coefficient of +1 indicates a perfect positive linear relationship.
- A coefficient of -1 indicates a perfect negative linear relationship.
- A coefficient of 0 means there is no linear relationship.
- The closer the coefficient is to +1 or -1, the stronger the linear relationship.
Limitations of Correlation Analysis
- It only measures linear relationships. Two variables could have a strong non-linear relationship and still have a correlation of zero.
- It can be heavily influenced by outliers.
- Correlation does not imply causation. Just because two variables move together does not mean one causes the other.
- It can lead to spurious correlations, where two variables appear related by chance but have no logical connection.