class: center, middle ## IMSE 586 ## Big Data Analytics and Visualization
### More on linear regression
### Instructor: Fred Feng --- $$Y=\beta_0+\beta_1x_1+\cdots+\beta_px_p+\epsilon$$ $$\text{where }\;\epsilon \sim \text{N}(0, \sigma^2)$$ One of the assumptions is that the errors are [i.i.d.](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables). -- For some data, the values are *not* independent. - Time series data (or [longitudinal](https://en.wikipedia.org/wiki/Longitudinal_study) data) -- .center[] -- Regression is typically used for [cross-sectional data](https://en.wikipedia.org/wiki/Cross-sectional_data). --- # Nonlinear relationship -- .center[Ice cream sales vs. outside temperature] -- .center[.gray[(hypothetical data)]  ] --- ### Knowing .red[what variables to consider] is crucial. -- .center[Distance vs. elevation gain for Fred's 25 bike rides]  --  --- - X: Ice cream sales - Y: Number of people drowning in swimming pools Are X and Y correlated? --- class: middle, center # .red[Correlation does not equal causation.] --- # Spurious correlation https://www.tylervigen.com/spurious-correlations