Two Classes
Therefore, one needs to find a latent space such that:
$\sigma_y^2(c1)+\sigma_y^2(c2)$ is minimum
$(\mu_y(c1)-\mu_y(c2))^2$ is maximum
The above equations could be combined together to form a optimization formula:
\[minimize\quad\frac{\sigma _y^2(c1)+\sigma_y^2(c2)}{(\mu_y(c1)-\mu_y(c2))^2 }\]
or
\[maximize\quad\frac{(\mu_y(c1)-\mu_y(c2))^2 }{\sigma _y^2(c1)+\sigma_y^2(c2)}\]
Assume that the latent space is $\{y_1,y_2,...,y_n\}$ and $y_i=w^Tx_i$.
Hence,
\[
\begin{align*}
\sigma_y^2(c1)&=\frac{1}{N_{c1}}\sum_{x_i\in c1}(y_i-u_y(c1))^2\\
&=\frac{1}{N_{c1}}\sum_{x_i\in c1}(w^T(x_i-u(c1)))^2\\
&=\frac{1}{N_{c1}}\sum_{x_i\in c1}w^T(x_i-u(c1))(x_i-u(c1))^Tw\\
&=w^T\frac{1}{N_{c1}}\sum_{x_i\in c1}(x_i-u(c1))(x_i-u(c1))^Tw\\
&=w^TS_1w\\
\end{align*}
\]
where
\[
\begin{align*}
\mu(c1)&=\frac{1}{N_{c1}}\sum_{x_i\in c1}x_i\
\end{align*}
\]
and
\[
\begin{align*}
\sigma_y^2(c2)&=w^TS_2w\\
\end{align*}
\]
obtain:
\[
\begin{align*}
\sigma_y^2(c1)+\sigma_y^2(c2)&=w^T(S_1+S_2)w=w^TS_ww\\
\end{align*}
\]
\[
\begin{align*}
(\mu_y(c1)-\mu_y(c2))^2&=w^T(\mu(c1)-\mu(c2))(\mu(c1)-\mu(c2))^Tw=w^TS_bw\\
\end{align*}
\]
Therefore
\[
\begin{align*}
maximize\quad\frac{(\mu_y(c1)-\mu_y(c2))^2}{\sigma_y^2(c1)+\sigma_y^2(c2)}&=\frac{w^TS_bw}{w^TS_ww}\\
\end{align*}
\]
Adding constrains
\[max\quad w^TS_bw\quad s.t.\quad w^TS_ww=1\]
Formulate the Lagrangian:
\[
\begin{align*}
\mathcal{L}(w,\lambda)&=w^TS_tw -\lambda(w^Tw-1)\\
\end{align*}
\]
Let $\frac{\partial \mathcal{L}(w,\lambda)}{\partial w}=0$
\[
\begin{align*}
&\Rightarrow\quad S_bw=\lambda S_ww\\
&\Rightarrow\quad S_w^{-1}S_bw=\lambda w
\end{align*}\]
Therefore, $w$ is the eigenvector corresponding to the largest eigenvalue of $S_w^{-1}S_b$
Multiple Classes
Multi-classification problems are more common in machine learning practice. Assume that the original data dimensions are reduced to $d$ dimensions, i.e. $W=[w_1,w_2,...,w_d]$.
Hence, within classes:
\[S_w=\sum_{j=1}^CS_j=\sum_{j=1}^C\frac{1}{N_{c1}}\sum_{x_i\in c_j}(x_i-u(c_j))(x_i-u(c_j))^T\]
Between classes:
\[S_b=\sum_{j=1}^C(\mu(c_j)-\mu)(\mu(c_j)-\mu)^T\]
where $\mu$ is the mean of all classes.
Obtain its optimization equation:
\[max\quad tr[W^TS_bW]\quad s.t.\quad W^TS_wW=I\]
Formulate the Lagrangian:
\[
\begin{align*}
\mathcal{L}(W,\Lambda)&=tr[W^TS_tW] -tr[\Lambda(W^TS_wW-I)]\\
\end{align*}
\]
Let $\frac{\partial \mathcal{L}(W,\Lambda)}{\partial W}=0$
\[
\begin{align*}
&\Rightarrow\quad S_bW=S_wW\Lambda\\
&\Rightarrow\quad S_w^{-1}S_bW=W\Lambda
\end{align*}\]
Therefore, $W$ is the eigenvectors corresponding to the $d$ largest eigenvalues of $S_w^{-1}S_b$
\[
\begin{align*}
\sigma_y^2(c1)&=\frac{1}{N_{c1}}\sum_{x_i\in c1}(y_i-u_y(c1))^2\\
&=\frac{1}{N_{c1}}\sum_{x_i\in c1}(w^T(x_i-u(c1)))^2\\
&=\frac{1}{N_{c1}}\sum_{x_i\in c1}w^T(x_i-u(c1))(x_i-u(c1))^Tw\\
&=w^T\frac{1}{N_{c1}}\sum_{x_i\in c1}(x_i-u(c1))(x_i-u(c1))^Tw\\
&=w^TS_1w\\
\end{align*}
\]
where
\[
\begin{align*}
\mu(c1)&=\frac{1}{N_{c1}}\sum_{x_i\in c1}x_i\
\end{align*}
\]
and
\[
\begin{align*}
\sigma_y^2(c2)&=w^TS_2w\\
\end{align*}
\]
obtain:
\[
\begin{align*}
\sigma_y^2(c1)+\sigma_y^2(c2)&=w^T(S_1+S_2)w=w^TS_ww\\
\end{align*}
\]
\[
\begin{align*}
(\mu_y(c1)-\mu_y(c2))^2&=w^T(\mu(c1)-\mu(c2))(\mu(c1)-\mu(c2))^Tw=w^TS_bw\\
\end{align*}
\]
Therefore
\[
\begin{align*}
maximize\quad\frac{(\mu_y(c1)-\mu_y(c2))^2}{\sigma_y^2(c1)+\sigma_y^2(c2)}&=\frac{w^TS_bw}{w^TS_ww}\\
\end{align*}
\]
Adding constrains
\[max\quad w^TS_bw\quad s.t.\quad w^TS_ww=1\]
Formulate the Lagrangian:
\[
\begin{align*}
\mathcal{L}(w,\lambda)&=w^TS_tw -\lambda(w^Tw-1)\\
\end{align*}
\]
Let $\frac{\partial \mathcal{L}(w,\lambda)}{\partial w}=0$
\[
\begin{align*}
&\Rightarrow\quad S_bw=\lambda S_ww\\
&\Rightarrow\quad S_w^{-1}S_bw=\lambda w
\end{align*}\]
Therefore, $w$ is the eigenvector corresponding to the largest eigenvalue of $S_w^{-1}S_b$
Multiple Classes
Multi-classification problems are more common in machine learning practice. Assume that the original data dimensions are reduced to $d$ dimensions, i.e. $W=[w_1,w_2,...,w_d]$.
Hence, within classes:
\[S_w=\sum_{j=1}^CS_j=\sum_{j=1}^C\frac{1}{N_{c1}}\sum_{x_i\in c_j}(x_i-u(c_j))(x_i-u(c_j))^T\]
Between classes:
\[S_b=\sum_{j=1}^C(\mu(c_j)-\mu)(\mu(c_j)-\mu)^T\]
where $\mu$ is the mean of all classes.
Obtain its optimization equation:
\[max\quad tr[W^TS_bW]\quad s.t.\quad W^TS_wW=I\]
Formulate the Lagrangian:
\[
\begin{align*}
\mathcal{L}(W,\Lambda)&=tr[W^TS_tW] -tr[\Lambda(W^TS_wW-I)]\\
\end{align*}
\]
Let $\frac{\partial \mathcal{L}(W,\Lambda)}{\partial W}=0$
\[
\begin{align*}
&\Rightarrow\quad S_bW=S_wW\Lambda\\
&\Rightarrow\quad S_w^{-1}S_bW=W\Lambda
\end{align*}\]
Therefore, $W$ is the eigenvectors corresponding to the $d$ largest eigenvalues of $S_w^{-1}S_b$
没有评论:
发表评论