Topics covered: Type I/II errors, p-values, one-sample t-test, independent t-test, ANOVA with Tukey HSD, chi-square test, Mann-Whitney U, Kruskal-Wallis, effect sizes
Learning objectives: By the end of this week you will be able to apply hypothesis testing concepts to real datasets, write executable Python code for each technique, and complete both graded assignments independently.
A hypothesis test starts with H0 (null) and H1 (alternative). The p-value is the probability of observing data at least as extreme as ours IF H0 is true. Reject H0 when p < alpha (typically 0.05). Type I error (false positive) rate = alpha. Type II error (false negative) rate = beta. Power = 1 - beta. Always report effect size (Cohen's d for t-tests) alongside p-value: statistical significance does not imply practical importance.
import numpy as np
from scipy import stats
np.random.seed(42)
processing_times = np.random.normal(loc=3.4, scale=0.9, size=45)
# One-sample t-test: H0: mean = 3.0 days (regulatory standard)
t_stat, p_value = stats.ttest_1samp(a=processing_times, popmean=3.0)
print(f'Sample mean: {processing_times.mean():.3f} days')
print(f't-statistic: {t_stat:.4f}')
print(f'p-value: {p_value:.4f}')
decision = 'Reject H0' if p_value < 0.05 else 'Fail to reject H0'
print(f'Decision at alpha=0.05: {decision}')
# Cohen's d effect size
cohens_d = (processing_times.mean() - 3.0) / processing_times.std(ddof=1)
print(f"Cohen's d: {cohens_d:.3f} ({'small' if abs(cohens_d)<0.5 else 'medium' if abs(cohens_d)<0.8 else 'large'} effect)")
Independent samples t-test compares means of two unrelated groups. Always run Levene's test for equality of variances first; use equal_var=False (Welch's t-test) if variances are unequal. One-way ANOVA tests whether means of 3+ groups are equal. A significant F-test tells you differences exist but not which pairs differ - follow up with Tukey HSD. Chi-square tests independence between two categorical variables.
from scipy import stats
from statsmodels.stats.multicomp import pairwise_tukeyhsd
import numpy as np
import pandas as pd
np.random.seed(0)
branch_A = np.random.normal(680, 75, 120)
branch_B = np.random.normal(710, 70, 95)
# Levene + two-sample t-test
_, lev_p = stats.levene(branch_A, branch_B)
t_stat, p_val = stats.ttest_ind(branch_A, branch_B, equal_var=(lev_p > 0.05))
print(f'Levene p={lev_p:.4f}, t={t_stat:.4f}, p={p_val:.4f}')
# One-way ANOVA
primary = np.random.normal(150000, 40000, 80)
secondary = np.random.normal(280000, 60000, 120)
tertiary = np.random.normal(450000, 90000, 100)
f_stat, p_anova = stats.f_oneway(primary, secondary, tertiary)
print(f'ANOVA F={f_stat:.4f}, p={p_anova:.6f}')
if p_anova < 0.05:
all_loans = np.concatenate([primary, secondary, tertiary])
groups = ['Primary']*80 + ['Secondary']*120 + ['Tertiary']*100
tukey = pairwise_tukeyhsd(all_loans, groups, alpha=0.05)
print(tukey)
Non-parametric tests make no distributional assumptions. Use when: sample size < 30 and normality cannot be assumed, data is ordinal, or the distribution is severely skewed. Mann-Whitney U (alternative to t-test) tests whether one group tends to have higher values than another. Kruskal-Wallis (alternative to ANOVA) tests whether 3+ group medians are equal. Check normality first with Shapiro-Wilk.
from scipy import stats
import numpy as np
np.random.seed(3)
# Right-skewed, non-normal data
group_control = stats.expon(scale=12).rvs(30)
group_treatment = stats.expon(scale=8).rvs(30)
# Shapiro-Wilk normality test
_, p_sw_c = stats.shapiro(group_control)
_, p_sw_t = stats.shapiro(group_treatment)
print(f'Shapiro-Wilk p (control): {p_sw_c:.4f}')
print(f'Shapiro-Wilk p (treatment): {p_sw_t:.4f}')
# p < 0.05 confirms non-normal - use Mann-Whitney
u_stat, p_mw = stats.mannwhitneyu(group_control, group_treatment, alternative='two-sided')
print(f'Mann-Whitney U={u_stat:.1f}, p={p_mw:.4f}')
print('Decision:', 'Reject H0 - groups differ' if p_mw < 0.05 else 'Fail to reject H0')
Submit completed notebooks to your GitHub repository before the next session. Feedback within 48 hours.
Complete A/B test analysis: state H0/H1, check normality with Shapiro-Wilk, choose appropriate test, compute p-value and effect size, write a 200-word business decision memo.
ANOVA/Kruskal-Wallis on a dataset with 4+ groups. Run the appropriate post-hoc test. Visualise with annotated box plots showing significant pairwise comparisons.