• WORKSHOP
  • PAPERS
  • DATA
  • MAIN
  • TOOLS
  • RESULTS
  • CONTACT


  • MC Test
    MC 160 MC 500
    System Single Multiple All Single Multiple All
    Trischler (2016) Parallel-Hierarchical 79.46 70.30 74.58 74.26 68.29 71.00
    Wang (2015) 84.22 67.85 75.27 72.05 67.94 69.94
    Sachan (2015) -- -- -- 67.65 67.99 67.83
    Smith (2015) Final + RTE 78.79 70.31 74.27 69.12 63.34 65.96
    Narasimhan (2015) Model 3 82.36 65.23 73.23 68.38 59.90 63.75
    Richardson (2013) SWD + RTE 76.78 62.50 69.16 68.01 59.45 63.33
    Yin (2016) HABCNN-TE 63.3 62.9 63.1 54.2 51.7 52.9


  • Todai Exam
    System English Japanese Japanese History Math IA Math IIB Physics World History
    /200 /150 /100 /100 /100 /100 /100
    Fujita (2013) Task-takers 52 62 56 57 41 39 58
    Student Average 88.3 72.2 45.6 52.0 47.6 42.0 46.6

  • DeepMind CNN
    System Validation Test
    Dhingra (2016) GA Reader Ensemble 76.4 77.4
    Sordoni (2016) Best Ensemble 74.5 75.7
    Kadlec (2016) Ensemble † 73.9 75.4
    Cui (2016) AoA Reader Single 73.1 74.4
    Trischler (2016) EpiReader 73.4 74.0
    Dhingra (2016) GA Reader Single 73.0 73.8
    Weissenborn (2016) QANN 4 hops -- 73.7
    Sordoni (2016) Single 72.6 73.3
    Chen (2016) Neural Net 72.4 72.4
    Kobayashi (2016) Full + w2v-init 71.2 72.9
    Kadlec (2016) AS Reader 68.6 69.9
    Hill (2016) Best 66.2 69.4
    Tian (2016) L2R -- 65.8
    Hermann (2015) Attentive 61.6 61.8
    Hermann (2015) Impatient 61.8 63.8
    Tian (2016) Deep LSTM -- 57.0
    †Result of other models as reported in Sordoni (2016)

  • DeepMind Daily Mail
    System Validation Test
    Dhingra (2016) GA Reader Ensemble 79.1 78.1
    Kadlec (2016) Greedy Ensemble 78.7 77.7
    Weissenborn (2016) QANN 8 hops -- 77.2
    Chen (2016) Neural Net 76.9 75.8
    Dhingra (2016) GA Reader Single 76.7 75.7
    Kadlec (2016) Single 75.0 73.9
    Hermann (2015) Attentive 70.5 69.0
    Hermann (2015) Impatient 69.0 68.0
    Tian (2016) L2R -- 67.3
    Tian (2016) Deep LSTM -- 62.2

  • Facebook Children's Book Test
    System Named Entities Common Nouns Verbs Prepositions
    Cui (2016) AoA Reader Single 72.0 69.4 -- --
    Trischler (2016) EpiReader Ensemble 71.8 70.6 -- --
    Sordoni (2016) Ensemble 72.0 71.0 -- --
    Dhingra (2016) GA Reader Ensemble 71.9 69.4 -- --
    Weissenborn (2016) QANN 8 hops 70.6 -- -- --
    Kadlec (2016) Ensemble † 70.6 68.9 -- --
    Trischler (2016) EpiReader Single 69.7 67.4 -- --
    Sordoni (2016) Best Single 68.6 69.2 -- --
    Dhingra (2016) GA Reader Single 69.0 63.9 -- --
    Kadlec (2016) Single 68.6 63.4 -- --
    Hill (2016) MemNNs Best 66.6 63.0 69.0 70.3
    Hill (2016) MemNNs Best 66.6 63.0 69.0 70.3
    Hill (2016) Contextual LSTM 43.6 58.2 80.5 80.6
    Hill (2016) Embedding (window + position) 40.2 50.6 73.6 67.0
    Hill (2016) Kneser-Ney + Cache 43.9 57.7 77.2 67.9
    †Result of other models as reported in Sordoni (2016)

  • Stanford Question Answering Dataset
    System Dev F1 Test F1
    Human Performance 90.5 86.8
    Rajpurkar (2016) Logistic Regression 51.0 51.0

  • Ai2 Science Exams (Aristo)
    No systematic, peer-reviewed results on this data set yet.