# Solution 4 Computational Intelligence Lab 2012

## Problem 1

### 1.1 Write down the log of the likelihood function

${\displaystyle \log(p(\mathbf {X} |\mathbf {\pi } ,\mathbf {\mu } ,\mathbf {\Sigma } ))=\sum _{n=1}^{N}\log \left\{\sum _{k=1}^{K}\pi _{k}{\mathcal {N}}(\mathbf {x} _{n}|\mathbf {\mu } _{k},\mathbf {\Sigma } _{k})\right\}}$ (slides)

### 1.2 Give the log of the likelyhood of this datapoint

Given: ${\displaystyle \mathbf {\Sigma } _{k}=\sigma _{k}^{2}\mathbf {I} }$

${\displaystyle \log(p(\mathbf {x} _{n}|\mathbf {\pi } ,\mathbf {\mu } ,\mathbf {\Sigma } ))=\log \left\{\sum _{k=1}^{K}\pi _{k}{\mathcal {N}}(\mathbf {x} _{n}|\mathbf {\mu } _{k},\mathbf {\Sigma } _{k})\right\}}$ ${\displaystyle =\log \left\{\sum _{k=1}^{K}\pi _{k}{1 \over {\sqrt {(2\pi )^{D}det(\Sigma _{k})}}}exp\left(-{\frac {1}{2}}(\mathbf {x} _{n}-\mathbf {\mu } _{k})\Sigma _{k}^{-1}(\mathbf {x} _{n}-\mathbf {\mu } _{k})\right)\right\}}$ ${\displaystyle =\log \left\{\sum _{k=1}^{K}\pi _{k}{1 \over {\sqrt {(2\pi )^{D}\ \sigma _{k}^{D}}}}exp\left(-{\frac {1}{2\sigma _{k}^{2}}}(\mathbf {x} _{n}-\mathbf {\mu } _{k})\mathbf {I} (\mathbf {x} _{n}-\mathbf {\mu } _{k})\right)\right\}}$

${\displaystyle \mathbf {I} \in \mathbb {R} ^{D\times D}}$

### 1.3 Calculate (...)

${\displaystyle p(\mathbf {x} _{n}|\mathbf {\mu } _{j},\mathbf {\Sigma } _{j})={1 \over {\sqrt {(2\pi )^{D}\ \sigma _{j}^{N}}}}exp\left(-{\frac {1}{2\sigma _{k}^{2}}}(\mathbf {x} _{n}-\mathbf {\mu } _{k})\mathbf {I} (\mathbf {x} _{n}-\mathbf {\mu } _{k})\right)}$

Adding the criterion ${\displaystyle \mu _{j}=x_{n}}$:

${\displaystyle p(\mathbf {x} _{n}|\mathbf {\mu } _{j},\mathbf {\Sigma } _{j})={1 \over {\sqrt {(2\pi )^{D}\ \sigma _{j}^{N}}}}}$

### 1.4 Limit ${\displaystyle \sigma _{j}}$ to 0, discuss the influence of this issue on the maximization of the likelihood function.

${\displaystyle \lim _{\sigma _{j}\mapsto 0}p(\mathbf {x} _{n}|\mathbf {\mu } _{j},\mathbf {\Sigma } _{j})=\lim _{\sigma _{j}\mapsto 0}{1 \over {\sqrt {(2\pi )^{D}\ \sigma _{j}^{D}}}}=\infty }$

The result makes sense since in this case the function is a probability density function!

Discussion: According to http://www.cedar.buffalo.edu/~srihari/CSE574/Chap9/Ch9.2-MixturesofGaussians.pdf (Slide 12) the maximization of log-likelihood is not well posed.

### 1.5 Can this situation occur in the case of a single Gaussian distribution

No, because:

${\displaystyle \log(p(\mathbf {X} |\mathbf {\pi } ,\mathbf {\mu } ,\mathbf {\Sigma } )=\sum _{n=1}^{N}\log \left\{\sum _{k=1}^{1}\pi _{k}{\mathcal {N}}(\mathbf {x} _{n}|\mathbf {\mu } _{k},\mathbf {\Sigma } _{k})\right\}=\sum _{n=1}^{N}\log \left\{{\mathcal {N}}(\mathbf {x} _{n}|\mathbf {\mu } ,\mathbf {\Sigma } )\right\}}$

Here we insert the condition: ${\displaystyle \mathbf {\mu } =\mathbf {x} _{j}}$:

${\displaystyle =\log \left\{{1 \over {\sqrt {(2\pi )^{D}\ \sigma ^{N}}}}\right\}+\log \left\{\sum _{n\neq j}^{N}{1 \over {\sqrt {(2\pi )^{D}\ \sigma ^{N}}}}exp\left(-{\frac {1}{2\sigma ^{2}}}(\mathbf {x} _{n}-\mathbf {\mu } )\mathbf {I} (\mathbf {x} _{n}-\mathbf {\mu } )\right)\right\}}$

And add the condition: ${\displaystyle \lim _{\sigma \mapsto 0}}$

${\displaystyle \lim _{\sigma \mapsto 0}log(p(\mathbf {X} |\mathbf {\pi } ,\mathbf {\mu } ,\mathbf {\Sigma } )=\lim _{\sigma \mapsto 0}\log \left\{{1 \over {\sqrt {(2\pi )^{D}\ \sigma ^{N}}}}\right\}+\log \left\{(N-1){1 \over {\sqrt {(2\pi )^{D}\ \sigma ^{N}}}}exp\left(-{\frac {1}{2\sigma ^{2}}}\right)\right\}=-\infty }$ (for N > 0)

So the probability density is ${\displaystyle -\infty }$ which makes this event impossible.

I put the question in the VIS forum: https://forum.vis.ethz.ch/showthread.php?15350

### 1.6

We can detect if clusters come near any datapoint and reset it.

## Problem 2

### 2.1 Suppose that we have solved a mixture of K Gaussians problem, and have obtained the values of the parameters. For this given solution, how many equivalent solutions do exist?

${\displaystyle K!}$

### 2.2 This problem is known as identifiability. Explain why this problem is irrelevant for the purpose of data clustering.

We are only interested which points can be grouped together.