Lösungsvorschlag Computational Intelligence Lab FS15: Unterschied zwischen den Versionen

Aus VISki
Wechseln zu: Navigation, Suche
(2.2 Mixture Model)
(Part 2)
 
(Eine dazwischenliegende Version desselben Benutzers wird nicht angezeigt)
Zeile 85: Zeile 85:
  
 
<math>
 
<math>
 +
 
\bar Y \bar Y^T = \begin{pmatrix} 100 & 30 \\ 30 & 225.5 \end{pmatrix}
 
\bar Y \bar Y^T = \begin{pmatrix} 100 & 30 \\ 30 & 225.5 \end{pmatrix}
 
</math>
 
</math>
  
 
The principal axes are shifted very slightly: new [-0.9752, 0.2211] vs old [-0.9767, 0.2145] and new [-0.2211, -0.9752] vs old [-0.2145, -0.9767].
 
The principal axes are shifted very slightly: new [-0.9752, 0.2211] vs old [-0.9767, 0.2145] and new [-0.2211, -0.9752] vs old [-0.2145, -0.9767].
 +
 +
I think the main point they want to make here is that by adding these points the direction where the data has its main variance does not really change. That is, the principal axis stay the same, as already pointed out in the provided derivation. In my opinion, calculating the covariance matrix for such a data matrix in a 2 hour exam is cumbersome - it should be enough to give a reasonable explanation why the principal components stay more or less the same.
  
 
==2 Clustering, Mixture Models, NMF==
 
==2 Clustering, Mixture Models, NMF==

Aktuelle Version vom 15. August 2019, 06:43 Uhr

1 Dimensionality reduction

1.1 SVD and PCA

  a) False, tries to maximize the variance
  b) False, because the principal component is the eigenvector of the covariance matrix, not of the data matrix. Also, eigenvalues are not defined for rectangular matrices.
  c) True
  d) False, vectors are rotated as well
  e) True

1.2 SVD

a) U and V are orthogonal, so for any x:

Therefore

The last step is again due to the orthogonality of U, i.e. it does not change the norm of the vector .

b) For diagonal matrix S it holds: .

, where

, where

Therefore

1.3 PCA

Part 1

with principal axes

and

Part 2

to project, draw a line orthogonal to the axis through the point. The projection is where it crosses the axis.

The axis from top to bottom is better (i.e. / is better than \), as the variance is higher amongst the projected points


The mean of Y is [0, -0.75] which is different from the mean of X ([0, -0.5]). This is why the suggested solution above, while elegant, does not work.

The principal axes are shifted very slightly: new [-0.9752, 0.2211] vs old [-0.9767, 0.2145] and new [-0.2211, -0.9752] vs old [-0.2145, -0.9767].

I think the main point they want to make here is that by adding these points the direction where the data has its main variance does not really change. That is, the principal axis stay the same, as already pointed out in the provided derivation. In my opinion, calculating the covariance matrix for such a data matrix in a 2 hour exam is cumbersome - it should be enough to give a reasonable explanation why the principal components stay more or less the same.

2 Clustering, Mixture Models, NMF

2.1 K-means Clustering

Set solve for

2.2 Mixture Model

Naive update

solve for ...


The problem of a naive update equation is that depends on all other s (Because they are all in the same log function, they are not separated when deriving by ). This is why we switch to the EM algorithm.


The EM algorithm maximizes the lower bound given by the Jensen's inequality.

2.3 Nonnegative Matrix Factorizations

a) False, NMF provides soft clustering
b) False, rank(U) and rank(Z) are strictly smaller than rank(X). Therefore rank(UZ) is also smaller than rank(X).
c) False
d) True
e) True

3 Sparse Coding and Dictionary Learning

3.1 Sparse Coding

Part 1

a) False, not convex
b) False, change of norm is relaxation of the constraints and does not yield the same solution
c) False, it hopefully yields sparser solution, but not necessarily
d) False, for a orthonormal basis m(B)=0

Part 2

Alternative proposal


Alternative proposal

Part 3


Take the parts with the two high peaks on the top signal and on bottom the part with the one high peak


Fourier is better for top signal

Wavelet better for bottom signal


The two peaks correspond to the two frequencies of the signal with the highest amplitudes

4 Optimization and Robust PCA

4.1 Lagrange Duality

a)

b)

c)

4.2 Convex Optimization

a) False
b) False (it's in : https://en.wikipedia.org/wiki/Epigraph_(mathematics) )
c) False, proof by contradiction  
Contradicts the definition of convex functions.
Note that 


4.3 Gradient Descent for Ridge Regression

a)


b)


Alternative proposal

The above proposal seems to subtract a scalar from a vector. I think the correct step is


(only consider one row of and the corresponding entry in at a time)

4.4 ADMM

a)


b)


c)


4.5 RPCA for Collaborative Filtering