-
MC 160 MC 500 System Single Multiple All Single Multiple All Trischler (2016) Parallel-Hierarchical 79.46 70.30 74.58 74.26 68.29 71.00 Wang (2015) 84.22 67.85 75.27 72.05 67.94 69.94 Sachan (2015) -- -- -- 67.65 67.99 67.83 Smith (2015) Final + RTE 78.79 70.31 74.27 69.12 63.34 65.96 Narasimhan (2015) Model 3 82.36 65.23 73.23 68.38 59.90 63.75 Richardson (2013) SWD + RTE 76.78 62.50 69.16 68.01 59.45 63.33 Yin (2016) HABCNN-TE 63.3 62.9 63.1 54.2 51.7 52.9
-
System English Japanese Japanese History Math IA Math IIB Physics World History /200 /150 /100 /100 /100 /100 /100 Fujita (2013) Task-takers 52 62 56 57 41 39 58 Student Average 88.3 72.2 45.6 52.0 47.6 42.0 46.6
-
System Validation Test Dhingra (2016) GA Reader Ensemble 76.4 77.4 Sordoni (2016) Best Ensemble 74.5 75.7 Kadlec (2016) Ensemble † 73.9 75.4 Cui (2016) AoA Reader Single 73.1 74.4 Trischler (2016) EpiReader 73.4 74.0 Dhingra (2016) GA Reader Single 73.0 73.8 Weissenborn (2016) QANN 4 hops -- 73.7 Sordoni (2016) Single 72.6 73.3 Chen (2016) Neural Net 72.4 72.4 Kobayashi (2016) Full + w2v-init 71.2 72.9 Kadlec (2016) AS Reader 68.6 69.9 Hill (2016) Best 66.2 69.4 Tian (2016) L2R -- 65.8 Hermann (2015) Attentive 61.6 61.8 Hermann (2015) Impatient 61.8 63.8 Tian (2016) Deep LSTM -- 57.0 †Result of other models as reported in Sordoni (2016)
-
System Validation Test Dhingra (2016) GA Reader Ensemble 79.1 78.1 Kadlec (2016) Greedy Ensemble 78.7 77.7 Weissenborn (2016) QANN 8 hops -- 77.2 Chen (2016) Neural Net 76.9 75.8 Dhingra (2016) GA Reader Single 76.7 75.7 Kadlec (2016) Single 75.0 73.9 Hermann (2015) Attentive 70.5 69.0 Hermann (2015) Impatient 69.0 68.0 Tian (2016) L2R -- 67.3 Tian (2016) Deep LSTM -- 62.2
-
System Named Entities Common Nouns Verbs Prepositions Cui (2016) AoA Reader Single 72.0 69.4 -- -- Trischler (2016) EpiReader Ensemble 71.8 70.6 -- -- Sordoni (2016) Ensemble 72.0 71.0 -- -- Dhingra (2016) GA Reader Ensemble 71.9 69.4 -- -- Weissenborn (2016) QANN 8 hops 70.6 -- -- -- Kadlec (2016) Ensemble † 70.6 68.9 -- -- Trischler (2016) EpiReader Single 69.7 67.4 -- -- Sordoni (2016) Best Single 68.6 69.2 -- -- Dhingra (2016) GA Reader Single 69.0 63.9 -- -- Kadlec (2016) Single 68.6 63.4 -- -- Hill (2016) MemNNs Best 66.6 63.0 69.0 70.3 Hill (2016) MemNNs Best 66.6 63.0 69.0 70.3 Hill (2016) Contextual LSTM 43.6 58.2 80.5 80.6 Hill (2016) Embedding (window + position) 40.2 50.6 73.6 67.0 Hill (2016) Kneser-Ney + Cache 43.9 57.7 77.2 67.9 †Result of other models as reported in Sordoni (2016)
-
System Dev F1 Test F1 Human Performance 90.5 86.8 Rajpurkar (2016) Logistic Regression 51.0 51.0
-
No systematic, peer-reviewed results on this data set yet.