Lösungsvorschlag Computational Intelligence Lab FS10

Aus VISki
Wechseln zu: Navigation, Suche

1. Clustering



The idea is to differentiate by :

(we differentiate according to: http://fourier.eng.hmc.edu/e161/lectures/algebra/node7.html )


  • We first sample the cluster index according to the probabilites
  • Then we sample the data point from the distribution


function x=sample_gmm(pis, mus, sigma)
 t = rand()
 i = 0
 s = 0
 for k in pis 
   i = i+1
   s = s + k % Try to fit k to a pis(k)
   if t <= s
 % Select a pis we generated from t
 x = rand_gaussian(mus(i), sigma)


k = find(cumsum(pis) >= rand(1), 1);
x = rand_gaussian(mus(k), sigma);


Proposal #1

Where is if otherwise .

Proposal #2

Proposal #3

The task is to "[...] not compute the values of all these events, but only give a symbolic expression." Accordingly I'd write


blue +: Gaussian Mixture Model

red x: K-Means

Reason: K-means is more sensitive to outliers. Additionally k-means can only consider points that are assigned entirely to its cluster for centroid estimation: a lot of the points that are close are not in the own cluster and therefore the centroids get pushed outside.

2. Dimensionality Reduction

Singular Value Decomposition



Note: according to the notation used in the exam (last page)


Choose because the first three singular values are very strong while the others are very weak.


In : box out the left 3 columns

In : box out the upper 3 rows

In : box out the upper left matrix


Approximation error under Euclidean matrix norm for is the value of



⇒ user 5 likes item 3


2nd and 5th items are identical. They have the same vector in , indicating that they have the same characteristics (concepts).

The different values in their 7th component is irrelevant because . These values vanish anyway by multiplication with

Principal Component Analysis


is covariance matrix of , i.e. ( resp. )


(Note: previously it was stated that this is which is not true, as nowhere is mentioned that is square)

Giving the eigenvalue decomposition of


Eigenvectors are the columns of

Eigenvalues are the diagonal elements of

3. Role Mining


Fraction of wrongly reconstructed permissions. Low=good, High=bad


High GE: Many wrongly predicted permissions -> bad. We could not predict the permissions well based on a subset of all permissions.

Low GE: Good


Same formula, but:

Deviation is based on the reconstruction of the original data set.

GE is based on the reconstruction of a new data set, where the reconstruction is based only on minimizing the deviation on a subset of permissions (prediction).


Quote from slides:

X is typically noisy, i.e., it contains permission assignments without an underlying role!

Therefore noisy means to have permissions without an underlying role and a GE=0 should not be possible.





Not containing


4. Deterministic Annealing

a) Correct order of code lines: 7 5 3 1 4 6 2

Note: This seems to be the EM Algorithm for Deterministic Annealing. EM Algorithms usually (the ones presented in 2015) perform first the E-step, then the M-step. Hence, I'd expect to see first line 3 followed by line 1. However, in 2015 this algorithm was not part of the lecture. Would be great if someone finds a source confirming one of both orders (.gregor)

Note: The order was wrong, it is first 3 then 1. See http://www.inf.ethz.ch/personal/alexeygr/slt15/lectures/slt15_lecture03.pdf page 15 and 17.

b) Quote by Kenneth Rose (p.6-7): Deterministic annealing can be intuitively understood as follows. Instead of making noisy moves on the given energy surface, we "incorporate" the noise into the energy function. We derive a sequence of effective energy functions that are parametrized by the temperature, the level of the noise. These effective cost functions will be very smooth at low , where even very large barriers can be easily traversed, and as is increased they become more ragged and converge to the original energy at . In fact, at the cost function will usually be convex, so that the global minimum of this function is easily obtained. Thus the deterministic annealing approach can be viewed as locating the global minimum at T=0, and tracking the minimum while gradually increasing , by using conventional convex optimization methods at each temperature.

Note: Whereas the quote is correct, notice that it talks about a parameter, not . We have . Therefore, the argument about T is the other way around (i.e that the surface is more smooth for and more ragged at for

5. Sparse Coding


Proposal #1

Two vectors in an inner product space V are orthogonal if their inner product is zero.

An orthogonal matrix is a square matrix with real entries whose column (or rows) are orthogonal unit vectors.

Proposal #2

Let , i.e. are column vectors of .

is orthogonal if

  • its column vectors are orthonormal:
  • it is a square matrix

The exam requires three properties. I'd say:

  1. Columns are pairwise orthogonal (case )
  2. Columns are unit vectors (case )
  3. Matrix is square


Reasons why each step holds:

-Definition of euclidean norm

-Property of transpose

-U orthogonal

-Property of I, definition of euclidean norm

We have shown that:

Apply an orthogonal matrix U on a vector X preserves the euclidean length.



CIL FS10 5c.jpg


Coding: transform x into Haar basis

Thresholding: cut off small components


The sum of the normalized basis vectors . The length of the sum of all normalized basis vectors is

This results in

Reason: The signal has to be represented with all 4 basis vectors. If we would represent this signal with 3 basis vectors, then the reconstruction error would increase.


[~, cur] = max(abs(U' * res));


res = res - coef * U(:, cur);


As the question states , we only need to do 2 iterations.

Note that in the picture the residuum is directed in the wrong direction: the residuum must always point towards the real value, as it holds that: approx + residuum = real vector

CIL FS10 5g.jpg