class: center, middle ## IMSE 586 ## Big Data Analytics and Visualization
### More on linear regression
### Instructor: Fred Feng --- $$Y=\beta_0+\beta_1x_1+\cdots+\beta_px_p+\epsilon$$ $$\text{where }\;\epsilon \sim \text{N}(0, \sigma^2)$$ One of the assumptions is that the errors are [i.i.d.](https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables). -- For some data, the values are *not* independent. - Time series data (or [longitudinal](https://en.wikipedia.org/wiki/Longitudinal_study) data) -- .center[data:image/s3,"s3://crabby-images/dbd7e/dbd7e58220d1696184b775eefb6613a3d04e3f18" alt=":scale 70%"] -- Regression is typically used for [cross-sectional data](https://en.wikipedia.org/wiki/Cross-sectional_data). --- # Nonlinear relationship -- .center[Ice cream sales vs. outside temperature] -- .center[.gray[(hypothetical data)] data:image/s3,"s3://crabby-images/5e63f/5e63fe82bf4ef8d508eafb4091a0d0e60f1118b8" alt=":scale 90%" ] --- ### Knowing .red[what variables to consider] is crucial. -- .center[Distance vs. elevation gain for Fred's 25 bike rides] data:image/s3,"s3://crabby-images/b0311/b03115f818f5cef350e3e0908636c1bd46badb79" alt=":scale 53%" -- data:image/s3,"s3://crabby-images/4d0f0/4d0f0999a51cf529ca18143bcf6b56028b989307" alt=":scale 44%" --- - X: Ice cream sales - Y: Number of people drowning in swimming pools Are X and Y correlated? --- class: middle, center # .red[Correlation does not equal causation.] --- # Spurious correlation https://www.tylervigen.com/spurious-correlations