# Solution 4 Computational Intelligence Lab 2012

## Problem 1

### 1.1 Write down the log of the likelihood function

$\log(p(\mathbf {X} |\mathbf {\pi } ,\mathbf {\mu } ,\mathbf {\Sigma } ))=\sum _{n=1}^{N}\log \left\{\sum _{k=1}^{K}\pi _{k}{\mathcal {N}}(\mathbf {x} _{n}|\mathbf {\mu } _{k},\mathbf {\Sigma } _{k})\right\}$ (slides)

### 1.2 Give the log of the likelyhood of this datapoint

Given: $\mathbf {\Sigma } _{k}=\sigma _{k}^{2}\mathbf {I}$

$\log(p(\mathbf {x} _{n}|\mathbf {\pi } ,\mathbf {\mu } ,\mathbf {\Sigma } ))=\log \left\{\sum _{k=1}^{K}\pi _{k}{\mathcal {N}}(\mathbf {x} _{n}|\mathbf {\mu } _{k},\mathbf {\Sigma } _{k})\right\}$ $=\log \left\{\sum _{k=1}^{K}\pi _{k}{1 \over {\sqrt {(2\pi )^{D}det(\Sigma _{k})}}}exp\left(-{\frac {1}{2}}(\mathbf {x} _{n}-\mathbf {\mu } _{k})\Sigma _{k}^{-1}(\mathbf {x} _{n}-\mathbf {\mu } _{k})\right)\right\}$ $=\log \left\{\sum _{k=1}^{K}\pi _{k}{1 \over {\sqrt {(2\pi )^{D}\ \sigma _{k}^{D}}}}exp\left(-{\frac {1}{2\sigma _{k}^{2}}}(\mathbf {x} _{n}-\mathbf {\mu } _{k})\mathbf {I} (\mathbf {x} _{n}-\mathbf {\mu } _{k})\right)\right\}$

$\mathbf {I} \in \mathbb {R} ^{D\times D}$

### 1.3 Calculate (...)

$p(\mathbf {x} _{n}|\mathbf {\mu } _{j},\mathbf {\Sigma } _{j})={1 \over {\sqrt {(2\pi )^{D}\ \sigma _{j}^{N}}}}exp\left(-{\frac {1}{2\sigma _{k}^{2}}}(\mathbf {x} _{n}-\mathbf {\mu } _{k})\mathbf {I} (\mathbf {x} _{n}-\mathbf {\mu } _{k})\right)$

Adding the criterion $\mu _{j}=x_{n}$:

$p(\mathbf {x} _{n}|\mathbf {\mu } _{j},\mathbf {\Sigma } _{j})={1 \over {\sqrt {(2\pi )^{D}\ \sigma _{j}^{N}}}}$

### 1.4 Limit $\sigma _{j}$ to 0, discuss the influence of this issue on the maximization of the likelihood function.

$\lim _{\sigma _{j}\mapsto 0}p(\mathbf {x} _{n}|\mathbf {\mu } _{j},\mathbf {\Sigma } _{j})=\lim _{\sigma _{j}\mapsto 0}{1 \over {\sqrt {(2\pi )^{D}\ \sigma _{j}^{D}}}}=\infty$

The result makes sense since in this case the function is a probability density function!

Discussion: According to http://www.cedar.buffalo.edu/~srihari/CSE574/Chap9/Ch9.2-MixturesofGaussians.pdf (Slide 12) the maximization of log-likelihood is not well posed.

### 1.5 Can this situation occur in the case of a single Gaussian distribution

No, because:

$\log(p(\mathbf {X} |\mathbf {\pi } ,\mathbf {\mu } ,\mathbf {\Sigma } )=\sum _{n=1}^{N}\log \left\{\sum _{k=1}^{1}\pi _{k}{\mathcal {N}}(\mathbf {x} _{n}|\mathbf {\mu } _{k},\mathbf {\Sigma } _{k})\right\}=\sum _{n=1}^{N}\log \left\{{\mathcal {N}}(\mathbf {x} _{n}|\mathbf {\mu } ,\mathbf {\Sigma } )\right\}$

Here we insert the condition: $\mathbf {\mu } =\mathbf {x} _{j}$:

$=\log \left\{{1 \over {\sqrt {(2\pi )^{D}\ \sigma ^{N}}}}\right\}+\log \left\{\sum _{n\neq j}^{N}{1 \over {\sqrt {(2\pi )^{D}\ \sigma ^{N}}}}exp\left(-{\frac {1}{2\sigma ^{2}}}(\mathbf {x} _{n}-\mathbf {\mu } )\mathbf {I} (\mathbf {x} _{n}-\mathbf {\mu } )\right)\right\}$

And add the condition: $\lim _{\sigma \mapsto 0}$

$\lim _{\sigma \mapsto 0}log(p(\mathbf {X} |\mathbf {\pi } ,\mathbf {\mu } ,\mathbf {\Sigma } )=\lim _{\sigma \mapsto 0}\log \left\{{1 \over {\sqrt {(2\pi )^{D}\ \sigma ^{N}}}}\right\}+\log \left\{(N-1){1 \over {\sqrt {(2\pi )^{D}\ \sigma ^{N}}}}exp\left(-{\frac {1}{2\sigma ^{2}}}\right)\right\}=-\infty$ (for N > 0)

So the probability density is $-\infty$ which makes this event impossible.

I put the question in the VIS forum: https://forum.vis.ethz.ch/showthread.php?15350

### 1.6

We can detect if clusters come near any datapoint and reset it.

## Problem 2

### 2.1 Suppose that we have solved a mixture of K Gaussians problem, and have obtained the values of the parameters. For this given solution, how many equivalent solutions do exist?

$K!$

### 2.2 This problem is known as identifiability. Explain why this problem is irrelevant for the purpose of data clustering.

We are only interested which points can be grouped together.