Two Classes
Therefore, one needs to find a latent space such that:
\sigma_y^2(c1)+\sigma_y^2(c2) is minimum
(\mu_y(c1)-\mu_y(c2))^2 is maximum
The above equations could be combined together to form a optimization formula:
minimize\quad\frac{\sigma _y^2(c1)+\sigma_y^2(c2)}{(\mu_y(c1)-\mu_y(c2))^2 }
or
maximize\quad\frac{(\mu_y(c1)-\mu_y(c2))^2 }{\sigma _y^2(c1)+\sigma_y^2(c2)}
Assume that the latent space is \{y_1,y_2,...,y_n\} and y_i=w^Tx_i.
Hence,
\begin{align*} \sigma_y^2(c1)&=\frac{1}{N_{c1}}\sum_{x_i\in c1}(y_i-u_y(c1))^2\\ &=\frac{1}{N_{c1}}\sum_{x_i\in c1}(w^T(x_i-u(c1)))^2\\ &=\frac{1}{N_{c1}}\sum_{x_i\in c1}w^T(x_i-u(c1))(x_i-u(c1))^Tw\\ &=w^T\frac{1}{N_{c1}}\sum_{x_i\in c1}(x_i-u(c1))(x_i-u(c1))^Tw\\ &=w^TS_1w\\ \end{align*}
where
\begin{align*} \mu(c1)&=\frac{1}{N_{c1}}\sum_{x_i\in c1}x_i\ \end{align*}
and
\begin{align*} \sigma_y^2(c2)&=w^TS_2w\\ \end{align*}
obtain:
\begin{align*} \sigma_y^2(c1)+\sigma_y^2(c2)&=w^T(S_1+S_2)w=w^TS_ww\\ \end{align*}
\begin{align*} (\mu_y(c1)-\mu_y(c2))^2&=w^T(\mu(c1)-\mu(c2))(\mu(c1)-\mu(c2))^Tw=w^TS_bw\\ \end{align*}
Therefore
\begin{align*} maximize\quad\frac{(\mu_y(c1)-\mu_y(c2))^2}{\sigma_y^2(c1)+\sigma_y^2(c2)}&=\frac{w^TS_bw}{w^TS_ww}\\ \end{align*}
Adding constrains
max\quad w^TS_bw\quad s.t.\quad w^TS_ww=1
Formulate the Lagrangian:
\begin{align*} \mathcal{L}(w,\lambda)&=w^TS_tw -\lambda(w^Tw-1)\\ \end{align*}
Let \frac{\partial \mathcal{L}(w,\lambda)}{\partial w}=0
\begin{align*} &\Rightarrow\quad S_bw=\lambda S_ww\\ &\Rightarrow\quad S_w^{-1}S_bw=\lambda w \end{align*}
Therefore, w is the eigenvector corresponding to the largest eigenvalue of S_w^{-1}S_b
Multiple Classes
Multi-classification problems are more common in machine learning practice. Assume that the original data dimensions are reduced to d dimensions, i.e. W=[w_1,w_2,...,w_d].
Hence, within classes:
S_w=\sum_{j=1}^CS_j=\sum_{j=1}^C\frac{1}{N_{c1}}\sum_{x_i\in c_j}(x_i-u(c_j))(x_i-u(c_j))^T
Between classes:
S_b=\sum_{j=1}^C(\mu(c_j)-\mu)(\mu(c_j)-\mu)^T
where \mu is the mean of all classes.
Obtain its optimization equation:
max\quad tr[W^TS_bW]\quad s.t.\quad W^TS_wW=I
Formulate the Lagrangian:
\begin{align*} \mathcal{L}(W,\Lambda)&=tr[W^TS_tW] -tr[\Lambda(W^TS_wW-I)]\\ \end{align*}
Let \frac{\partial \mathcal{L}(W,\Lambda)}{\partial W}=0
\begin{align*} &\Rightarrow\quad S_bW=S_wW\Lambda\\ &\Rightarrow\quad S_w^{-1}S_bW=W\Lambda \end{align*}
Therefore, W is the eigenvectors corresponding to the d largest eigenvalues of S_w^{-1}S_b
\begin{align*} \sigma_y^2(c1)&=\frac{1}{N_{c1}}\sum_{x_i\in c1}(y_i-u_y(c1))^2\\ &=\frac{1}{N_{c1}}\sum_{x_i\in c1}(w^T(x_i-u(c1)))^2\\ &=\frac{1}{N_{c1}}\sum_{x_i\in c1}w^T(x_i-u(c1))(x_i-u(c1))^Tw\\ &=w^T\frac{1}{N_{c1}}\sum_{x_i\in c1}(x_i-u(c1))(x_i-u(c1))^Tw\\ &=w^TS_1w\\ \end{align*}
where
\begin{align*} \mu(c1)&=\frac{1}{N_{c1}}\sum_{x_i\in c1}x_i\ \end{align*}
and
\begin{align*} \sigma_y^2(c2)&=w^TS_2w\\ \end{align*}
obtain:
\begin{align*} \sigma_y^2(c1)+\sigma_y^2(c2)&=w^T(S_1+S_2)w=w^TS_ww\\ \end{align*}
\begin{align*} (\mu_y(c1)-\mu_y(c2))^2&=w^T(\mu(c1)-\mu(c2))(\mu(c1)-\mu(c2))^Tw=w^TS_bw\\ \end{align*}
Therefore
\begin{align*} maximize\quad\frac{(\mu_y(c1)-\mu_y(c2))^2}{\sigma_y^2(c1)+\sigma_y^2(c2)}&=\frac{w^TS_bw}{w^TS_ww}\\ \end{align*}
Adding constrains
max\quad w^TS_bw\quad s.t.\quad w^TS_ww=1
Formulate the Lagrangian:
\begin{align*} \mathcal{L}(w,\lambda)&=w^TS_tw -\lambda(w^Tw-1)\\ \end{align*}
Let \frac{\partial \mathcal{L}(w,\lambda)}{\partial w}=0
\begin{align*} &\Rightarrow\quad S_bw=\lambda S_ww\\ &\Rightarrow\quad S_w^{-1}S_bw=\lambda w \end{align*}
Therefore, w is the eigenvector corresponding to the largest eigenvalue of S_w^{-1}S_b
Multiple Classes
Multi-classification problems are more common in machine learning practice. Assume that the original data dimensions are reduced to d dimensions, i.e. W=[w_1,w_2,...,w_d].
Hence, within classes:
S_w=\sum_{j=1}^CS_j=\sum_{j=1}^C\frac{1}{N_{c1}}\sum_{x_i\in c_j}(x_i-u(c_j))(x_i-u(c_j))^T
Between classes:
S_b=\sum_{j=1}^C(\mu(c_j)-\mu)(\mu(c_j)-\mu)^T
where \mu is the mean of all classes.
Obtain its optimization equation:
max\quad tr[W^TS_bW]\quad s.t.\quad W^TS_wW=I
Formulate the Lagrangian:
\begin{align*} \mathcal{L}(W,\Lambda)&=tr[W^TS_tW] -tr[\Lambda(W^TS_wW-I)]\\ \end{align*}
Let \frac{\partial \mathcal{L}(W,\Lambda)}{\partial W}=0
\begin{align*} &\Rightarrow\quad S_bW=S_wW\Lambda\\ &\Rightarrow\quad S_w^{-1}S_bW=W\Lambda \end{align*}
Therefore, W is the eigenvectors corresponding to the d largest eigenvalues of S_w^{-1}S_b