Logistic regression

class: center, middle

## IMSE 586 
## Big Data Analytics and Visualization
<br/>

### Logistic regression
<br/>
### Instructor: Fred Feng

---
class: middle, center

#Classification

![:scale 39.6%](images/cat.jpg)      
![:scale 40%](images/dog.jpg)

---
# Will a credit card customer default?

```
import pandas as pd
df = pd.read_csv('./data/default.csv')
df.head()
```

.center[![:scale 70%](images/default_dataframe.png)]

---

$$\text{default_binary}=
\begin{cases}
    1, & \text{if } \text{default = Yes;} \\\
    0, & \text{if } \text{default = No.}
\end{cases}
$$

```
(
    so.Plot(df, x='balance', y='default_binary')
    .add(so.Dot())
)
```

.center[![:scale 100%](images/default_scatter.png)]

---
# Logistic regression

$$p(x)=\frac{e^{\beta_0+\beta_1x}}{1+e^{\beta_0+\beta_1x}}$$

p(x): probability of default given that the balance is x.

<br>
.center[![:scale 100%](images/default_model.png)]

---
# Logistic regression

$$
\begin{aligned}
p(x)&=\frac{e^{\beta_0+\beta_1x}}{1+e^{\beta_0+\beta_1x}} \\\
\\\
\frac{p(x)}{1-p(x)}&=e^{\beta_0+\beta_1x} \\\
\\\
\ln{\frac{p(x)}{1-p(x)}}&=\beta_0+\beta_1x \\\
\end{aligned}
$$

The log-odds (or logit) is a linear function of x.

---
# Interpretation of the slope parameter

$$\frac{p(x)}{1-p(x)}=e^{\beta_0+\beta_1x}$$

When we increase x by 1, the odds

$$
\small
\begin{aligned}
\frac{p(x+1)}{1-p(x+1)}=e^{\beta_0+\beta_1(x+1)}
&=e^{\beta_1}e^{\beta_0+\beta_1x}
=e^{\beta_1}\frac{p(x)}{1-p(x)}
\end{aligned}
$$

increase by a ratio of

$$e^{\beta_1}$$

---

$$\ln{\frac{p(x)}{1-p(x)}}=\beta_0+\beta_1x$$

When we increase x by 1, the log-odds

$$
\small
\begin{aligned}
\ln{\frac{p(x+1)}{1-p(x+1)}}&=\beta_0+\beta_1(x+1) \\\
&=\beta_0+\beta_1x+\beta_1\\\
&=\ln{\frac{p(x)}{1-p(x)}}+\beta_1
\end{aligned}
$$

increase by

$$\beta_1$$