Lösungsvorschlag Computational Intelligence Lab FS17

Aus VISki
Wechseln zu: Navigation, Suche

Disclaimer: I am very unsure about those solutions, but still adding them so that something is there that can be corrected :D

1 Dimensionality reduction

1.1 PCA on an Ellipse

a)

(since )

b)

for x coordinate:

(since cosine in interval between 0 and 2 * pi has a mean of 0)


for y coordinate:

(since sinus in interval between 0 and 2 * pi has a mean of 0)

c)

since mean is 0, nothing has to be subtracted. So cov matrix will be = (two sums are the same for sure, but I am not sure which value they sum up to, let's call it y)

Note:

The final result should be

d)

since we know that a > b, the first principal vector is with eigenvalue

e)

it just keeps the x-coordinate, so

f)

reconstruction error = y coordinate =

Remark: I think the reconstruction error refers to the total error of points, which is the sum of unkeeped eigenvalues. That is

1.2 PCA in 3D

a)

then rewrite this according to given formula.

b)

The first principal component is or normalized

Its corresponding eigenvalue is

The easiest way to arrive at these values is realizing a priori that the points lie on a line and therefore its direction must be the eigenvector corresponding to the only non-zero eigenvalue. Then you can just multiply the covariance matrix with any direction vector on the line and determine how much its length is scaled up by the linear transformation. If you don't realize you can calculate the determinant of the matrix minus lambda on the diagonal and set it to zero, but surely nobody would ever do that... Who me? Noo...

c)

normalize eigenvector first and then multiply:

d)

The reconstruction error is 0, since the data can be perfectly reconstructed (rank of covariance matrix is 1).

1.3 SVD for Solving Linear Systems

a)

we can multiply this by since it is an orthogonal matrix and this won't change the length.

b)


The problem is convex, so any locally optimal solution will be the global optimum, thus it is unique. To minimize, take the derivative and set it to zero:

We can leave out all the dimensions that are set to zero in since they will be multiplied by 0 on both sides anyways. Let's take as the square (invertible) matrix that consists of the nonzero entries. We now have:


c)

From a we know that we can instead optimize . If we set and , then we get and therefore

1.4 SVD Properties

a) really not sure what they expect here?

b)

so eigenvalues are

so eigenvalues are , which is the same as before.

c)

A being a square symmetric matrix means that eigenvectors are orthogonal.

2 Optimization, NMF, pLSA and Word Embedding

2.1 Optimization

a)

Proof by contradiction: Assume we have a local minimum that is not the global minimum, so so that but so that .

We know that our function is convex, so it must hold that .

Let's now choose so that ends up in the ball around , so we know that .

We also know since , that

So we get which contradicts the equation stated in the beginning (convexity).

b)

Given in the question we have the smoothness condition:

or equivalently:

And then we have the relation between and :

for

Combined we see that:

And in particular if we choose the point for the generic we get the following true expression which we will work towards later:

That line is our goal. Starting from the expression the question asks us to prove we can show equivalence to the goal and thereby conclude the proof.

Expanding squared two norms

Replacing with and the gradient descent update step

Using linearity of scalar product

Expanding norm again

Using linearity of scalar product

Moving the out

Dividing by which is positive so the direction doesn't flip

And that's our goal from above. QED

c)

if then we know that , so we must have reached the global optimum.

d)

This is similar as c), but since the function is not convex anymore it is just an optimum, not necessarily the global optimum.

e)

2.2 pLSA and Word Embedding

a)

1) how often does word j occur in document i

2)

b)

1)

2) with

3) lower bound: update rule:

c)

d)

1)

2) to make sure that words that occur more often together have a larger weight, but also that the weight is capped at some point.

2.3 True/False Questions

a) T b) F c) T d) F e) F f) T

(For b) Isn't it true? The Frobenius norm is convex for . The question says it is convex in , not on specifically.)

(For f) False? We don't need step (b)?

3 Neural networks and clustering

3.1 Neural Networks

a)

I'm not sure about the summation. Here's my solution

b)

thus then by Cauchy-Schwartz

c)

so if we choose and , we get the desired result.

d)

It takes O(n) because we have to differentiate with respect to every weight.

e)

1) extract interesting features from image

2) reduce size of image

f)

Each round you go from nxn to (n-2)x(n-2). So wee need (rounded up) (n-3)/2 iterations.

3.2 True/False Questions

a) T b) F c) T d) F e) F

3.3 Clustering

a)

1) compute cluster centers:

2) assign data points:

3) compute cluster centers:

4) assign data points:

From now on, nothing will change anymore.

b)

Intuitive approach, no idea if that might be true:

One guess:

c)

mean: covariance: total:

(Remarks:

1. do we need to distinguish whether the covariance matrix is diagonal?

2. For the , we still require parameters?)

3.4 K-means vs. Gaussian mixture modeel (GMM)

a)

GMM: K-means: clusters are initialized uniformly at random, so

b)

In k-means clusters are spherical, so covariances are . In GMM, clusters can be ellipsoids, so covariance matrix does not need to be diagonal.

c)

Two very stretched ellipses next to each other would be difficult for k-means, since it expect spheres. It should not be a problem for GMM though, since there the shape of the ellipsoid can be learned as well.

d)

A calculation of each step is cheaper in K-means. In addition to that, K-means needs less iterations.

4 Sparse Coding and Dictionary Learning

4.1 1-D Fourier transform

a)

1b 2a

b)

for signal b, since it has a lot of high frequencies and a has none.

c)

Because then we have to also save the permutation, which will probably cost as much as saving the whole signal.

4.2 2-D Fourier transform

a)

Image b, since it has less high frequencies than image a (which has scarf and tablecloth)

b)

a, since b has a lot of high frequencies in the original image which will be lost. Image a originally does not have a lot of high frequencies (see a), so denoising will work well.

4.3 sparse coding with an over-complete dictionary

a)

draw four vectors from middle cluster to the outer clusters.

b)

Each center corresponds to one atom and additionally one center corresponds to no atom at all (0,0).

4.4 Compressed Sensing

a)

No, there are no non-zero elements.

b)

Yes it is, only the on counterfeit coin will have a non-zero element.

c)

d)

e)

f)

g)

for some constant c