TL;DR

Statistics boils down to three things: describing data, finding patterns, and making predictions from samples
Mean, median, and mode each tell you something different about your data — knowing which to use matters
Standard deviation tells you how spread out your data is (small = consistent, large = all over the place)
P-values and hypothesis testing sound scary, but they're just answering "is this result real or just luck?"

Statistics is one of those courses that everyone has to take and nobody feels ready for. Whether you're a psych major, a business student, a bio student, or just fulfilling a requirement, stats is waiting for you.

The good news: statistics is actually one of the most practical classes you'll ever take. Once you understand the basics, you can evaluate research claims, understand polls, interpret medical studies, and generally call BS on misleading data for the rest of your life.

The bad news: the way it's taught can be mind-numbing. So let's fix that.

Part 1: Describing Data

Before you can do anything fancy with data, you need to describe it. What's the typical value? How spread out is it? What does the distribution look like?

Measures of Center

These tell you what's "typical" in your dataset.

Mean (Average) Add up all the values and divide by how many there are.

Data: 2, 4, 6, 8, 10
Mean = (2 + 4 + 6 + 8 + 10) / 5 = 30/5 = 6

The mean is the most common measure of center, but it has a weakness: outliers pull it up or down. If Jeff Bezos walks into a coffee shop, the average net worth of everyone in that room is suddenly billions — even though nothing changed for anyone else.

Median (Middle Value) Line up all values from smallest to largest and find the one in the middle.

Data: 2, 4, 6, 8, 100
Median = 6 (the middle value)
Mean = 24 (pulled up by the 100)

The median is better than the mean when you have outliers or skewed data. This is why housing reports use "median home price" instead of "average home price" — a few mansions would make the average misleading.

Mode (Most Common Value) The value that appears most often.

Data: 2, 3, 3, 4, 5, 5, 5, 6
Mode = 5 (appears three times)

The mode is most useful for categorical data (what's the most popular major?) and for understanding distributions.

When to Use Which

Situation	Best Measure
Normal distribution, no outliers	Mean
Skewed data or outliers present	Median
Categorical data	Mode
Income or housing data	Median
Test scores (normal class)	Mean
"Most popular" questions	Mode

Measures of Spread

Knowing the center isn't enough. You also need to know how spread out the data is.

Range

Range = Maximum - Minimum
Data: 3, 5, 7, 9, 11
Range = 11 - 3 = 8

Simple but sensitive to outliers. One extreme value changes the entire range.

Standard Deviation (SD) This is the big one. Standard deviation measures how far, on average, values are from the mean.

Small SD → data is clustered close to the mean (consistent)
Large SD → data is spread out (variable)

Example:

Class A test scores: 78, 80, 79, 81, 82 → SD ≈ 1.4 (very consistent)
Class B test scores: 60, 70, 80, 90, 100 → SD ≈ 14.1 (all over the place)

Both classes have the same mean (80), but Class A is much more consistent.

How to calculate standard deviation:

Find the mean
Subtract the mean from each value (these are "deviations")
Square each deviation
Find the average of the squared deviations (this is the "variance")
Take the square root of the variance (this is the SD)

Your calculator or Excel does this in one step. But understanding the process helps you understand what the number means.

The Normal Distribution (Bell Curve)

Many natural phenomena follow a normal distribution — a symmetric, bell-shaped curve where most values cluster around the mean.

The 68-95-99.7 Rule:

68% of data falls within 1 SD of the mean
95% falls within 2 SDs of the mean
99.7% falls within 3 SDs of the mean

So if the average test score is 75 with an SD of 10:

68% of students scored between 65 and 85
95% scored between 55 and 95
99.7% scored between 45 and 105

This rule comes up constantly in statistics. Know it cold.

Part 2: Probability Basics

Probability is the language of statistics. It ranges from 0 (impossible) to 1 (certain).

Key Concepts

Simple probability: P(event) = favorable outcomes / total outcomes

P(rolling a 3 on a die) = 1/6 ≈ 0.167 or 16.7%

Complement: P(not A) = 1 - P(A)

P(not rolling a 3) = 1 - 1/6 = 5/6 ≈ 83.3%

Addition Rule: P(A or B) = P(A) + P(B) - P(A and B)

For mutually exclusive events (can't happen together):

P(rolling a 3 or a 5) = 1/6 + 1/6 = 2/6 = 1/3

Multiplication Rule: P(A and B) = P(A) × P(B) for independent events

P(two heads in a row) = 1/2 × 1/2 = 1/4

Conditional Probability

P(A|B) = "the probability of A, given that B has happened"

P(A|B) = P(A and B) / P(B)

Example: If 40% of students study and pass, and 60% of students study, what's the probability of passing given that you studied?

P(pass|studied) = 0.40/0.60 = 0.667 or 66.7%

Part 3: Hypothesis Testing

This is where most students start sweating. But the concept is actually pretty straightforward once you strip away the jargon.

The Big Idea

Hypothesis testing answers one question: "Is this result real, or could it have happened by chance?"

Example scenario: A new study drug claims to improve test scores. Students who took the drug scored 5 points higher on average than students who didn't. Is that a real effect, or just random variation?

Hypothesis testing gives you a framework to answer this.

The Steps (Simplified)

Step 1: State your hypotheses

Null hypothesis (H₀): Nothing interesting is happening. The drug has no effect. Any difference is due to chance.
Alternative hypothesis (H₁): Something IS happening. The drug does have an effect.

Step 2: Collect data and calculate a test statistic The test statistic measures how far your observed result is from what you'd expect under the null hypothesis.

Common test statistics:

z-test — When you know the population standard deviation
t-test — When you don't (this is the one you'll use most often)
chi-square test — For categorical data

Step 3: Find the p-value The p-value answers: "If the null hypothesis were true (the drug doesn't work), how likely would it be to see results this extreme or more extreme?"

Small p-value (< 0.05) → The result is unlikely to be due to chance → Reject the null hypothesis → "Statistically significant"
Large p-value (≥ 0.05) → The result could easily be due to chance → Don't reject the null hypothesis → "Not statistically significant"

Step 4: Make your decision If p < 0.05 (the most common threshold): reject H₀ and conclude the effect is real. If p ≥ 0.05: don't reject H₀. You can't conclude the effect is real.

The p-value: What It Is and What It Isn't

What it IS: The probability of getting results as extreme as yours if there's truly no effect.

What it ISN'T:

The probability that the null hypothesis is true
The probability that you made an error
A measure of how big or important the effect is

A p-value of 0.03 doesn't mean there's a 3% chance the result is fake. It means: if the drug truly had no effect, you'd see results this extreme only 3% of the time.

Type I and Type II Errors

Type I Error (False Positive): You reject the null hypothesis when it's actually true. "You concluded the drug works, but it actually doesn't."

Type II Error (False Negative): You fail to reject the null hypothesis when it's actually false. "You concluded the drug doesn't work, but it actually does."

Think of it like a fire alarm:

Type I = false alarm (alarm goes off, no fire)
Type II = missed alarm (fire, but alarm doesn't go off)

Part 4: Common Test Types (Quick Reference)

One-sample t-test

Use when: Comparing a sample mean to a known population mean Example: "Do students at our school score differently from the national average?"

Two-sample t-test

Use when: Comparing means between two groups Example: "Do students who study with music score differently from those who study in silence?"

Paired t-test

Use when: Comparing two measurements from the same group Example: "Did students' scores improve from the pre-test to the post-test?"

Chi-square test

Use when: Testing relationships between categorical variables Example: "Is there a relationship between gender and major choice?"

Correlation

Use when: Measuring the strength of the relationship between two variables Values: -1 (perfect negative) to +1 (perfect positive), 0 = no relationship Example: "Is there a relationship between hours studied and exam scores?"

Remember: Correlation does NOT mean causation. Ice cream sales and drowning deaths are correlated (both increase in summer), but ice cream doesn't cause drowning.

Common Mistakes in Stats Class

Confusing Correlation with Causation

Just because two things happen together doesn't mean one causes the other. This is the #1 stats mistake in the real world.

Misinterpreting P-values

A p-value of 0.001 doesn't mean the effect is "very real" or "very large." It means the result is very unlikely under the null hypothesis. Effect size and p-value are different things.

Forgetting to Check Assumptions

Every test has assumptions (normality, independence, equal variances, etc.). Violating assumptions can make your results meaningless. Always check before running a test.

Mixing Up SD and Standard Error

Standard deviation describes the spread of your data
Standard error describes the precision of your sample mean
They're related but not the same thing

How to Study for Stats

Statistics is weird because it combines math, logic, and interpretation. Here's what works:

Focus on understanding the concepts first, formulas second. If you understand what a p-value means, the formula is just mechanics.
Practice with real data. Abstract problems are harder to learn from than real examples. "Is there a relationship between sleep and GPA?" is more engaging than "test H₀: μ = 0."
Use software early. Your professor will probably use Excel, SPSS, R, or a calculator. Learn the tool alongside the concepts, not separately.
Draw pictures. Sketch distributions, shade areas under curves, visualize what your test is doing. This builds intuition.
Create a formula sheet. Even if you can't use it on the exam, the act of organizing all the formulas helps you see the patterns.
Get help when stuck. Statistics builds on itself. If you don't understand probability, hypothesis testing will be a nightmare. Use tools like Gradily to get step-by-step explanations when you're stuck.

Stats isn't about being a math genius. It's about understanding how to think about data. And that's a skill that'll serve you long after the final exam.

Editorial Standards

Statistics 101: Concepts Every Student Must Know