class: center, middle ## IMSE 586 ## Big Data Analytics and Visualization
### Logistic regression
### Instructor: Fred Feng --- class: middle, center #Classification ![:scale 39.6%](images/cat.jpg) ![:scale 40%](images/dog.jpg) --- # Will a credit card customer default? ``` import pandas as pd df = pd.read_csv('./data/default.csv') df.head() ``` .center[![:scale 70%](images/default_dataframe.png)] --- $$\text{default_binary}= \begin{cases} 1, & \text{if } \text{default = Yes;} \\\ 0, & \text{if } \text{default = No.} \end{cases} $$ -- ``` ( so.Plot(df, x='balance', y='default_binary') .add(so.Dot()) ) ``` .center[![:scale 100%](images/default_scatter.png)] --- # Logistic regression $$p(x)=\frac{e^{\beta_0+\beta_1x}}{1+e^{\beta_0+\beta_1x}}$$ p(x): probability of default given that the balance is x. --
.center[![:scale 100%](images/default_model.png)] --- # Logistic regression $$ \begin{aligned} p(x)&=\frac{e^{\beta_0+\beta_1x}}{1+e^{\beta_0+\beta_1x}} \\\ \\\ \frac{p(x)}{1-p(x)}&=e^{\beta_0+\beta_1x} \\\ \\\ \ln{\frac{p(x)}{1-p(x)}}&=\beta_0+\beta_1x \\\ \end{aligned} $$ The log-odds (or logit) is a linear function of x. --- # Interpretation of the slope parameter $$\frac{p(x)}{1-p(x)}=e^{\beta_0+\beta_1x}$$ When we increase x by 1, the odds $$ \small \begin{aligned} \frac{p(x+1)}{1-p(x+1)}=e^{\beta_0+\beta_1(x+1)} &=e^{\beta_1}e^{\beta_0+\beta_1x} =e^{\beta_1}\frac{p(x)}{1-p(x)} \end{aligned} $$ increase by a ratio of $$e^{\beta_1}$$ --- $$\ln{\frac{p(x)}{1-p(x)}}=\beta_0+\beta_1x$$ When we increase x by 1, the log-odds $$ \small \begin{aligned} \ln{\frac{p(x+1)}{1-p(x+1)}}&=\beta_0+\beta_1(x+1) \\\ &=\beta_0+\beta_1x+\beta_1\\\ &=\ln{\frac{p(x)}{1-p(x)}}+\beta_1 \end{aligned} $$ increase by $$\beta_1$$