Lösungsvorschlag Information Retrieval FS18: Unterschied zwischen den Versionen

Aus VISki
Wechseln zu: Navigation, Suche
(Standard Inverted Index)
(Eine dazwischenliegende Version desselben Benutzers wird nicht angezeigt)
Zeile 7: Zeile 7:
 
<li>A</li>
 
<li>A</li>
 
<li>B</li>
 
<li>B</li>
<li>C (I think B) (+1)</li>
+
<li>C (I think B: <math> b \in \{1,2,3\} </math>) (+1)</li>
 
<li>B</li>
 
<li>B</li>
 
<li>C</li>
 
<li>C</li>

Version vom 15. August 2019, 15:34 Uhr

If you disagree with the solution, please state so by appending to it with your reasoning.

I Boolean Retrieval

Standard Inverted Index

  1. C
  2. A
  3. B
  4. C (I think B: ) (+1)
  5. B
  6. C
  7. B
  8. C
  9. C
  10. A
  11. D

Boolean Queries

  1. 1 2 3 4 5 6
  2. None
  3. 4 6
  4. 1 2 3 4 5 6
  5. 4
  6. 1 2 3 5
  7. 1 2 3 5
  8. 1 5 6

II Pre-processing and term vocabulary

  1. A
  2. C (Felix: I propose D. "Some 8 and some 16" is not necessarily fixed length. As opposed to C which I understand as fixed length. Additionaly one can only store 0 to 127 codepoints on 8 bits because of the leading zero.)
  3. D
  4. D (I think C)
  5. C
  6. C
  7. A
  8. C (changed from C to B, since the index described in A is called a "k-gram index", not a "bi-word index") (Elwin: I disagree, the index is called bi-word index, see Slide 143 in "Term Vocabulary". Since A and B are correct, C should be chosen)

III Tolerant Retrieval

Jaccard Coefficient

  1. 0
  2. 2/3
  3. 3/14
  4. 1
  5. 1/2
  6. 1
  7. 0
  8. 0

Levenshtein Distance

  1. 1
  2. 1
  3. 7
  4. 0
  5. 1
  6. 10
  7. 5
  8. 10

IV Index Compression

Standard Inverted Index

  1. C
  2. A
  3. D
  4. D
  5. B

V Ranked Retrieval

  1. 311
  2. 0
  3. 705
  4. 320
  5. 200
  6. 30
  7. 6 (2100), 5 (390), 1 (320)
  8. 15*40 + 3*30 = 690
  9. 0*30 + 11*30 = 330
  10. 11*40 + 70*30 = 2540
  11. 6 (2540)

VI Scoring

  1. A
  2. B

VII Probabilistic Retrieval

  1. B
  2. C:
  3. A
  4. D
  5. A, C
  6. C - (should be B? see discussion)
  7. A
  8. B
  9. C

VIII Evaluation

  1. A
  2. A
  3. C
  4. B
  5. A
  6. B
  7. C
  8. A (81.8 %)
  9. A
  10. C
  11. C
  12. C
  13. A