Entropy
Entropy can be used as a measure of uncertainty in information theory.
Entropy H(Y)
of a random Y
with n
different possible values:
$$
H(Y) =-\sum^{n}_{i=1}P(y_i)log_2P(y_i)
$$
Where $P(y_i)$ is the probaility that random variable Y
equals $y_i$ (One of n
different possible values of Y
).
n
represents the number of types of outcomes, for example, if the outcome is 1, 2, 3 then n is 3. $P(y_i)$represents the probability of different outcomes, for example, if 1, 2, 3 then the probability of 1 is $1/3$.
Example
$X_1$ | $X_2$ | Y |
---|---|---|
T | F | T |
T | T | T |
F | F | F |
F | T | T |
T | F | F |
Y has two values: T and F. In the equation n
= 2. Therefore, we can get:
$$
H(Y) =-\sum^{n}_{i=1}P(y_i)log_2P(y_i)
$$
$$
H(Y) =-\frac{3}{5}log_2\frac{3}{5}-\frac{2}{5}log_2\frac{2}{5} \approx0.292
$$
Conditional entropy
Conditional entropy $H(Y|X=x_j)$ of Y
given $X=x_i$.
$$
H(Y|X=x_j)=-\sum^{n}_{i=1}P(y_i|X=x_j)log_2P(y_i|X=x_j)
$$
After split we can get:
$$
H(Y|X)=\sum^{k}_{j=1}P(x_j)H(Y|X=x_j)
$$
Example
$X_1$ | $X_2$ | Y |
---|---|---|
T | F | T |
T | T | T |
F | F | F |
F | T | T |
T | F | F |
Suppose we split the data based on the value of $X_1$. $X_1$ has two possible values: T and F. We can compute the conditional entropy for $X_1=T$ and $X_1=F$.
Compute $X_1=T$
$X_1$ | Y |
---|---|
T | T |
T | T |
T | F |
F | T |
F | F |
After splitting, when $X_1=T$, Y has two values. Therefore, $P(Y=T|X_1=T) = 2/3$ and $P(Y=F|X_1=T) = 2/3$.
We can jave the following entropy when $X_1=T$:
$$
H(Y|X_1=T)=-\sum^{n}_{i=1}P(y_i|X_1=T)log_2P(y_i|X_1=T)
$$
$$
H(Y|X_1=T)= -\frac{2}{3}log_2(\frac{2}{3})-\frac{1}{3}log_2(\frac{1}{3}) \approx 0.28
$$
Compute $X_1=F$
$X_1$ | Y |
---|---|
T | T |
T | T |
T | F |
F | T |
F | F |
After splitting, when $X_1=F$, Y has two values. Therefore, $P(Y=T|X_1=F) = 1/2$ and $P(Y=F|X_1=T) = 1/2$.
We can jave the following entropy when $X_1=F$:
$$
H(Y|X_1=T)=-\sum^{n}_{i=1}P(y_i|X_1=F)log_2P(y_i|X_1=F)
$$
$$
H(Y|X_1=T)=-\frac{1}{2}log_2(\frac{1}{2})-\frac{1}{2}log_2(\frac{1}{2}) \approx 0.3
$$
Entropy after split
$X_1$ | Y |
---|---|
T | T |
T | T |
T | F |
F | T |
F | F |
We get $P(X_1=T)=3/5$ and $P(X_1=F)=2/5$. We have the following overall conditional entropy:
$$
H(Y|X_1)=\frac{3}{5}0.28+\frac{2}{5}0.3=0.288
$$
Information Gain
Information Gain $I(X,Y)$ is defined as the expected reduction in entropy of target varible Y after split over variable X.
$$
I(X,Y)=H(Y)-H(Y|X)
$$
In the previous example, we can get the infromation gain is:
$$
I(X_1,Y)=H(Y)-H(Y|X_1)=0.292-0.288\approx0.004
$$