Module 5 Univariate EDA | Readings for MTH107

Understanding the distribution of data is crucial in statistics and data analysis. One of the fundamental concepts in this area is the distinction between left skewed and right skewed distributions. These terms describe the shape of a dataset's distribution and provide insights into the data's characteristics, such as central tendency and variability. This post will delve into the definitions, characteristics, and applications of left skewed and right skewed distributions, helping you gain a comprehensive understanding of these important statistical concepts.

Table of Contents

Understanding Skewness

Skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. In simpler terms, it indicates the direction and degree of asymmetry in a dataset. There are three types of skewness:

Positive Skewness (Right Skewed): The tail on the right side of the distribution is longer or fatter than the left side.
Negative Skewness (Left Skewed): The tail on the left side of the distribution is longer or fatter than the right side.
Zero Skewness: The distribution is symmetric, meaning the tails on both sides are equal.

Characteristics of Left Skewed Distributions

A left skewed distribution, also known as negatively skewed, has a long left tail. This means that the mass of the distribution is concentrated on the right, and the tail on the left side is longer or fatter. Key characteristics include:

Mean < Median < Mode: In a left skewed distribution, the mean is typically less than the median, which is less than the mode.
Asymmetry: The distribution is not symmetric; it is skewed to the left.
Tail Length: The left tail is longer or fatter than the right tail.

Characteristics of Right Skewed Distributions

A right skewed distribution, also known as positively skewed, has a long right tail. This means that the mass of the distribution is concentrated on the left, and the tail on the right side is longer or fatter. Key characteristics include:

Mode < Median < Mean: In a right skewed distribution, the mode is typically less than the median, which is less than the mean.
Asymmetry: The distribution is not symmetric; it is skewed to the right.
Tail Length: The right tail is longer or fatter than the left tail.

Visualizing Skewness

Visualizing data distributions is essential for understanding skewness. Histograms and box plots are commonly used to visualize the shape of a dataset. Here’s how you can interpret these visualizations:

Histograms: A histogram shows the frequency of data points within specific ranges. In a left skewed distribution, the histogram will have a longer tail on the left side. In a right skewed distribution, the histogram will have a longer tail on the right side.
Box Plots: A box plot provides a summary of the dataset, including the median, quartiles, and potential outliers. In a left skewed distribution, the box plot will show a longer whisker on the left side. In a right skewed distribution, the box plot will show a longer whisker on the right side.

Calculating Skewness

Skewness can be calculated using various methods, but the most common approach is to use the formula for Pearson’s moment coefficient of skewness. The formula is as follows:

📝 Note: The formula for Pearson’s moment coefficient of skewness is:

Skewness = n / [(n - 1)(n - 2)] * Σ[(x_i - x̄)³ / s³]

Where:

n is the number of observations.
x̄ is the mean of the observations.
s is the standard deviation of the observations.
x_i represents each individual observation.

Applications of Skewness

Understanding skewness is crucial in various fields, including finance, economics, and data science. Here are some key applications:

Risk Management: In finance, skewness is used to assess the risk of investment portfolios. A right skewed distribution indicates a higher risk of extreme losses, while a left skewed distribution indicates a higher risk of extreme gains.
Economics: In economics, skewness is used to analyze income distribution. A right skewed distribution of income indicates that a small percentage of the population earns a disproportionately large share of the total income.
Data Science: In data science, skewness is used to preprocess data before applying machine learning algorithms. Understanding the skewness of a dataset can help in selecting appropriate algorithms and preprocessing techniques.

Interpreting Skewness in Real-World Data

Let’s consider a real-world example to illustrate the concepts of left skewed and right skewed distributions. Suppose we have data on the ages of employees in a company. The ages are as follows:

Age	Frequency
20	5
25	10
30	15
35	20
40	25
45	15
50	10

To determine the skewness of this dataset, we can calculate the mean, median, and mode:

Mean: 35.5 years
Median: 35 years
Mode: 40 years

Since the mean is greater than the median, which is greater than the mode, this distribution is right skewed. This indicates that there are more younger employees in the company, and the ages are concentrated on the left side of the distribution.

📝 Note: In this example, the right skewed distribution suggests that the company has a younger workforce, with a few older employees contributing to the longer right tail.

Transforming Skewed Data

In many cases, it is desirable to transform skewed data into a more symmetric distribution. This can be achieved using various transformation techniques, such as:

Log Transformation: Applying a logarithmic transformation to the data can reduce right skewness. This is particularly useful for data that spans several orders of magnitude.
Square Root Transformation: Taking the square root of the data can also reduce right skewness, especially for data with a moderate level of skewness.
Box-Cox Transformation: This is a more general transformation that can handle both left and right skewed data. It involves transforming the data to a power of λ, where λ is chosen to minimize the skewness.

Conclusion

Understanding left skewed and right skewed distributions is fundamental in statistics and data analysis. These concepts provide insights into the shape and characteristics of a dataset, helping analysts make informed decisions. By visualizing and calculating skewness, and applying appropriate transformations, analysts can gain a deeper understanding of their data and improve the accuracy of their analyses. Whether in finance, economics, or data science, recognizing and interpreting skewness is a crucial skill for any data professional.

Related Terms: