01Text normalisation: URLs replaced with "url", phone numbers with "phone", currency symbols with "money" - reduces vocabulary noise
02PorterStemmer collapses inflected forms (winning/winner/wins to win) to reduce feature space dimensionality
03TF-IDF vectorisation with unigrams and bigrams, 5,000 features; bigrams capture multi-word spam patterns such as "call now" and "win cash"
04Three-model comparison: Multinomial Naive Bayes, Logistic Regression, Linear SVM under identical conditions
05LinearSVC selected: finds a maximum-margin hyperplane in TF-IDF space; outperforms probabilistic models on this task
06Feature coefficients extracted to reveal the 20 most predictive spam and ham words for interpretability