# Lösungsvorschlag Information Retrieval FS18: Unterschied zwischen den Versionen

If you disagree with the solution, please state so by appending to it with your reasoning.

## I Boolean Retrieval

### Standard Inverted Index

1. C
2. A
3. B
4. C (I think B: since ${\displaystyle b\in \{1,2,3\}}$) (+1)
5. B
6. C
7. B
8. C
9. C
10. A
11. D

1. 1 2 3 4 5 6
2. None
3. 4 6
4. 1 2 3 4 5 6
5. 4
6. 1 2 3 5
7. 1 2 3 5
8. 1 5 6

## II Pre-processing and term vocabulary

1. A
2. C (Felix: I propose D. "Some 8 and some 16" is not necessarily fixed length. As opposed to C which I understand as fixed length. Additionaly one can only store 0 to 127 codepoints on 8 bits because of the leading zero.)
3. D (B? cf is high for terms which appear a lot of times in only one document => df is more robust
4. D (I think C)
5. C
6. C
7. A
8. C (changed from C to B, since the index described in A is called a "k-gram index", not a "bi-word index") (Elwin: I disagree, the index is called bi-word index, see Slide 143 in "Term Vocabulary". Since A and B are correct, C should be chosen)

1. 0
2. 2/3
3. 3/14
4. 1
5. 1/2
6. 1
7. 0
8. 0

1. 1
2. 1
3. 7
4. 0
5. 1
6. 10
7. 5
8. 10

1. C
2. A
3. D
4. D
5. B

## V Ranked Retrieval

1. 311
2. 0
3. 705
4. 320
5. 200
6. 30
7. 6 (2100), 5 (390), 1 (320)
8. 15*40 + 3*30 = 690
9. 0*30 + 11*30 = 330
10. 11*40 + 70*30 = 2540
11. 6 (2540)

1. A
2. B

## VII Probabilistic Retrieval

1. B
2. C: ${\displaystyle P(Spy|Battle)={\frac {P(Battle|Spy)}{P(Battle)}}\cdot P(Spy)={\frac {90\%}{60\%}}\cdot 55\%=82.5\%}$
3. A
4. D
5. A, C
6. C - (should be B? see discussion)
7. A
8. B
9. C

1. A
2. A
3. C
4. B
5. A
6. B
7. C
8. A (81.8 %)
9. A
10. C
11. C
12. C
13. A