Table of Contents
Introduction
To the Reader
Updates
Concepts You Should Know
I Regression and Its Generalizations
Regression Basics
Statistics, Data Analysis, Regression
Guessing the Value of a Random Variable
Estimating the Expected Value
The Regression Function
Some Disclaimers
Estimating the Regression Function
The Bias-Variance Tradeoff
The Bias-Variance Trade-Off in Action
Ordinary Least Squares Linear Regression as Smoothing
Linear Smoothers
k-Nearest-Neighbor Regression
Kernel Smoothers
Exercises
The Truth about Linear Regression
Optimal Linear Prediction: Multiple Variables
Collinearity
The Prediction and Its Error
Estimating the Optimal Linear Predictor
Unbiasedness and Variance of Ordinary Least Squares Estimates
Shifting Distributions, Omitted Variables, and Transformations
Changing Slopes
R2: Distraction or Nuisance?
Omitted Variables and Shifting Distributions
Errors in Variables
Transformation
Adding Probabilistic Assumptions
Examine the Residuals
On Significant Coefficients
Linear Regression Is Not the Philosopher's Stone
Exercises
Model Evaluation
What Are Statistical Models For?
Errors, In and Out of Sample
Over-Fitting and Model Selection
Cross-Validation
Data-set Splitting
k-Fold Cross-Validation (CV)
Leave-one-out Cross-Validation
Warnings
Parameter Interpretation
Exercises
Smoothing in Regression
How Much Should We Smooth?
Adapting to Unknown Roughness
Bandwidth Selection by Cross-Validation
Convergence of Kernel Smoothing and Bandwidth Scaling
Summary on Kernel Smoothing
Kernel Regression with Multiple Inputs
Interpreting Smoothers: Plots
Average Predictive Comparisons
Exercises
Simulation
What Do We Mean by ``Simulation''?
How Do We Simulate Stochastic Models?
Chaining Together Random Variables
Random Variable Generation
Built-in Random Number Generators
Transformations
Quantile Method
Rejection Method
The Metropolis Algorithm and Markov Chain Monte Carlo
Generating Uniform Random Numbers
Sampling
Sampling Rows from Data Frames
Multinomials and Multinoullis
Probabilities of Observation
Repeating Simulations
Why Simulate?
Understanding the Model; Monte Carlo
Checking the Model
Sensitivity Analysis
The Method of Simulated Moments
The Method of Moments
Adding in the Simulation
An Example: Moving Average Models and the Stock Market
Exercises
Appendix: Some Design Notes on the Method of Moments Code
The Bootstrap
Stochastic Models, Uncertainty, Sampling Distributions
The Bootstrap Principle
Variances and Standard Errors
Bias Correction
Confidence Intervals
Other Bootstrap Confidence Intervals
Hypothesis Testing
Double bootstrap hypothesis testing
Parametric Bootstrapping Example: Pareto's Law of Wealth Inequality
Non-parametric Bootstrapping
Parametric vs. Nonparametric Bootstrapping
Bootstrapping Regression Models
Re-sampling Points: Parametric Example
Re-sampling Points: Non-parametric Example
Re-sampling Residuals: Example
Bootstrap with Dependent Data
Things Bootstrapping Does Poorly
Further Reading
Exercises
Weighting and Variance
Weighted Least Squares
Heteroskedasticity
Weighted Least Squares as a Solution to Heteroskedasticity
Some Explanations for Weighted Least Squares
Finding the Variance and Weights
Conditional Variance Function Estimation
Iterative Refinement of Mean and Variance: An Example
Real Data Example: Old Heteroskedastic
Re-sampling Residuals with Heteroskedasticity
Local Linear Regression
Advantages and Disadvantages of Locally Linear Regression
Lowess
Exercises
Splines
Smoothing by Directly Penalizing Curve Flexibility
The Meaning of the Splines
Computational Example: Splines for Stock Returns
Confidence Bands for Splines
Basis Functions and Degrees of Freedom
Basis Functions
Degrees of Freedom
Splines in Multiple Dimensions
Smoothing Splines versus Kernel Regression
Further Reading
Exercises
Additive Models
Partial Residuals and Back-fitting for Linear Models
Additive Models
The Curse of Dimensionality
Example: California House Prices Revisited
Closing Modeling Advice
Further Reading
Testing Regression Specifications
Testing Functional Forms
Examples of Testing a Parametric Model
Remarks
Other Nonparametric Regressions
Why Use Parametric Models At All?
Why We Sometimes Want Mis-Specified Parametric Models
More about Hypothesis Testing
Logistic Regression
Modeling Conditional Probabilities
Logistic Regression
Likelihood Function for Logistic Regression
Logistic Regression with More Than Two Classes
Newton's Method for Numerical Optimization
Newton's Method in More than One Dimension
Iteratively Re-Weighted Least Squares
Generalized Linear Models and Generalized Additive Models
Generalized Additive Models
An Example (Including Model Checking)
Exercises
GLMs and GAMs
Generalized Linear Models and Iterative Least Squares
GLMs in General
Examples of GLMs
Vanilla Linear Models
Binomial Regression
Poisson Regression
Uncertainty
Generalized Additive Models
Weather Forecasting in Snoqualmie Falls
Exercises
II Multivariate Data, Distributions, and Latent Structure
Multivariate Distributions
Review of Definitions
Multivariate Gaussians
Linear Algebra and the Covariance Matrix
Conditional Distributions and Least Squares
Projections of Multivariate Gaussians
Computing with Multivariate Gaussians
Inference with Multivariate Distributions
Estimation
Model Comparison
Goodness-of-Fit
Exercises
Density Estimation
Histograms Revisited
``The Fundamental Theorem of Statistics''
Error for Density Estimates
Error Analysis for Histogram Density Estimates
Kernel Density Estimates
Analysis of Kernel Density Estimates
Joint Density Estimates
Categorical and Ordered Variables
Practicalities
Kernel Density Estimation in R: An Economic Example
Conditional Density Estimation
Practicalities and a Second Example
More on the Expected Log-Likelihood Ratio
Simulating from Density Estimates
Simulating from Kernel Density Estimates
Sampling from a Kernel Joint Density Estimate
Sampling from Kernel Conditional Density Estimates
Sampling from Histogram Estimates
Examples of Simulating from Kernel Density Estimates
Exercises
Relative Distributions and Smooth Tests
Smooth Tests of Goodness of Fit
From Continuous CDFs to Uniform Distributions
Testing Uniformity
Neyman's Smooth Test
Choice of Function Basis
Choice of Number of Basis Functions
Application: Combining p-Values
Density Estimation by Series Expansion
Smooth Tests of Non-Uniform Parametric Families
Estimated Parameters
Implementation in R
Some Examples
Conditional Distributions and Calibration
Relative Distributions
Estimating the Relative Distribution
R Implementation and Examples
Example: Conservative versus Liberal Brains
Example: Economic Growth Rates
Adjusting for Covariates
Example: Adjusting Growth Rates
Further Reading
Exercises
Principal Components Analysis
Mathematics of Principal Components
Minimizing Projection Residuals
Maximizing Variance
More Geometry; Back to the Residuals
Statistical Inference, or Not
Example: Cars
Latent Semantic Analysis
Principal Components of the New York Times
PCA for Visualization
PCA Cautions
Exercises
Factor Analysis
From PCA to Factor Analysis
Preserving correlations
The Graphical Model
Observables Are Correlated Through the Factors
Geometry: Approximation by Hyper-planes
Roots of Factor Analysis in Causal Discovery
Estimation
Degrees of Freedom
More unknowns (free parameters) than equations (constraints)
A Clue from Spearman's One-Factor Model
Estimating Factor Loadings and Specific Variances
Maximum Likelihood Estimation
Alternative Approaches
Estimating Factor Scores
The Rotation Problem
Factor Analysis as a Predictive Model
How Many Factors?
R2 and Goodness of Fit
Reification, and Alternatives to Factor Models
The Rotation Problem Again
Factors or Mixtures?
The Thomson Sampling Model
Mixture Models
Two Routes to Mixture Models
From Factor Analysis to Mixture Models
From Kernel Density Estimates to Mixture Models
Mixture Models
Geometry
Identifiability
Probabilistic Clustering
Simulation
Estimating Parametric Mixture Models
More about the EM Algorithm
Further Reading on and Applications of EM
Topic Models and Probabilistic LSA
Non-parametric Mixture Modeling
Worked Computating Example
Mixture Models in R
Fitting a Mixture of Gaussians to Real Data
Calibration-checking for the Mixture
Selecting the Number of Components by Cross-Validation
Interpreting the Mixture Components, or Not
Hypothesis Testing for Mixture-Model Selection
Exercises
Graphical Models
Conditional Independence and Factor Models
Directed Acyclic Graph (DAG) Models
Conditional Independence and the Markov Property
Examples of DAG Models and Their Uses
Missing Variables
Non-DAG Graphical Models
Undirected Graphs
Further reading
Directed but Cyclic Graphs
Further Reading
III Causal Inference
Graphical Causal Models
Causation and Counterfactuals
Causal Graphical Models
Calculating the ``effects of causes''
Back to Teeth
Conditional Independence and d-Separation
D-Separation Illustrated
Linear Graphical Models and Path Coefficients
Positive and Negative Associations
Independence and Information
Further Reading
Exercises
Identifying Causal Effects
Causal Effects, Interventions and Experiments
The Special Role of Experiment
Identification and Confounding
Identification Strategies
The Back-Door Criterion: Identification by Conditioning
The Entner Rules
The Front-Door Criterion: Identification by Mechanisms
The Front-Door Criterion and Mechanistic Explanation
Instrumental Variables
Some Invalid Instruments
Critique of Instrumental Variables
Failures of Identification
Summary
Further Reading
Exercises
Estimating Causal Effects
Estimators in the Back- and Front- Door Criteria
Estimating Average Causal Effects
Avoiding Estimating Marginal Distributions
Propensity Scores
Matching and Propensity Scores
Instrumental-Variables Estimates
Uncertainty and Inference
Recommendations
Further Reading
Exercises
Discovering Causal Structure
Testing DAGs
Testing Conditional Independence
Faithfulness and Equivalence
Partial Identification of Effects
Causal Discovery with Known Variables
The PC Algorithm
Causal Discovery with Hidden Variables
Partial identification of effects
On Conditional Independence Tests
Software and Examples
Limitations on Consistency of Causal Discovery
Further Reading
Exercises
IV Dependent Data
Time Series
Time Series, What They Are
Other kinds of time series
Notation
Stationarity
Autocorrelation
The Ergodic Theorem
The World's Simplest Ergodic Theorem
Rate of Convergence
Why Ergodicity Matters
Markov Models
Meaning of the Markov Property
Autoregressive Models
Autoregressions with Covariates
Additive Autoregressions
Example: The lynx
Linear Autoregression
``Unit Roots'' and Stationary Solutions
Conditional Variance
Example: lynx
Regression with Correlated Noise; Generalized Least Squares
Bootstrapping Time Series
Parametric or Model-Based Bootstrap
Block Bootstraps
Sieve Bootstrap
Trends and De-Trending
Forecasting Trends
Seasonal Components
Detrending by Differencing
Further Reading
Exercises
Time Series with Latent Variables
Longitudinal, Spatial and Network Data
Appendices
Programming
Functions
First Example: Pareto Quantiles
Functions Which Call Functions
Sanity-Checking Arguments
Layering Functions and Debugging
More on Debugging
Automating Repetition and Passing Arguments
Avoiding Iteration: Manipulating Objects
ifelse and which
apply and Its Variants
More Complicated Return Values
Re-Writing Your Code: An Extended Example
General Advice on Programming
Comment your code
Use meaningful names
Check whether your program works
Avoid writing the same thing twice
Start from the beginning and break it down
Break your code into many short, meaningful functions
Further Reading
Big O and Little o Notation
2 and the Likelihood Ratio Test
Proof of the Gauss-Markov Theorem
Constrained and Penalized Optimization
Constrained Optimization
Lagrange Multipliers
Penalized Optimization
Mini-Example: Constrained Linear Regression
Statistical Remark: ``Ridge Regression'' and ``The Lasso''
Rudimentary Graph Theory
Pseudo-code for the SGS Algorithm
Pseudo-code for the SGS Algorithm
Pseudo-code for the PC Algorithm