Lösungsvorschlag Information Retrieval FS18: Unterschied zwischen den Versionen

Aus VISki
Wechseln zu: Navigation, Suche
(Standard Inverted Index)
(Added link to community solutions)
 
Zeile 1: Zeile 1:
 
If you disagree with the solution, please state so by '''appending''' to it with your reasoning.
 
If you disagree with the solution, please state so by '''appending''' to it with your reasoning.
 +
 +
You might also want to check out the VIS Community Solutions: https://exams.vis.ethz.ch/exams/tq71maod.pdf.
  
 
== I Boolean Retrieval ==
 
== I Boolean Retrieval ==

Aktuelle Version vom 16. August 2019, 09:59 Uhr

If you disagree with the solution, please state so by appending to it with your reasoning.

You might also want to check out the VIS Community Solutions: https://exams.vis.ethz.ch/exams/tq71maod.pdf.

I Boolean Retrieval

Standard Inverted Index

  1. C
  2. A
  3. B
  4. C (I think B: since ) (+1) (I think C: 'expanding' is only in documents 1, 2, 3 and 5. However, each of these documents also contains either 'visible' or 'finite') (I think C as well)
  5. B
  6. C
  7. B
  8. C
  9. C
  10. A
  11. D

Boolean Queries

  1. 1 2 3 4 5 6
  2. None
  3. 4 6
  4. 1 2 3 4 5 6
  5. 4
  6. 1 2 3 5
  7. 1 2 3 5
  8. 1 5 6

II Pre-processing and term vocabulary

  1. A
  2. C (Felix: I propose D. "Some 8 and some 16" is not necessarily fixed length. As opposed to C which I understand as fixed length. Additionaly one can only store 0 to 127 codepoints on 8 bits because of the leading zero.)
  3. D (B? cf is high for terms which appear a lot of times in only one document => df is more robust
  4. D (I think C)
  5. C
  6. C
  7. A
  8. C (changed from C to B, since the index described in A is called a "k-gram index", not a "bi-word index") (Elwin: I disagree, the index is called bi-word index, see Slide 143 in "Term Vocabulary". Since A and B are correct, C should be chosen)

III Tolerant Retrieval

Jaccard Coefficient

  1. 0
  2. 2/3
  3. 3/14
  4. 1
  5. 1/2
  6. 1
  7. 0
  8. 0

Levenshtein Distance

  1. 1
  2. 1
  3. 7
  4. 0
  5. 1
  6. 10
  7. 5
  8. 10

IV Index Compression

Standard Inverted Index

  1. C
  2. A
  3. D
  4. D
  5. B

V Ranked Retrieval

  1. 311
  2. 0
  3. 705
  4. 320
  5. 200
  6. 30
  7. 6 (2100), 5 (390), 1 (320)
  8. 15*40 + 3*30 = 690
  9. 0*30 + 11*30 = 330
  10. 11*40 + 70*30 = 2540
  11. 6 (2540)

VI Scoring

  1. A
  2. B

VII Probabilistic Retrieval

  1. B
  2. C:
  3. A
  4. D
  5. A, C
  6. C - (should be B? see discussion)
  7. A
  8. B
  9. C

VIII Evaluation

  1. A
  2. A
  3. C
  4. B
  5. A
  6. B
  7. C
  8. A (81.8 %)
  9. A
  10. C
  11. C
  12. C
  13. A