An experimental study measuring human annotator categorization agreement on commonsense sentences

Santos, Henrique; Kejriwal, Mayank; Mulvehill, Alice; Forbush, Gretchen; McGuinness, Deborah L.

An experimental study measuring human annotator categorization agreement on commonsense sentences

Authors

Santos, Henrique

Kejriwal, Mayank

Mulvehill, Alice

Forbush, Gretchen

McGuinness, Deborah L.

Issue Date

2021-06-18

Keywords

Machine Common Sense (MCS) Multi-modal Open World Grounded Learning and Inference (MOWGLI)

URI

https://www.cambridge.org/core/journals/experimental-results/article/an-experimental-study-measuring-human-annotator-categorization-agreement-on-commonsense-sentences/8CC52DFFE4EBF5013638ECC03557D5BA

Abstract

Developing agents capable of commonsense reasoning is an important goal in Artificial Intelligence (AI) research. Because commonsense is broadly defined, a computational theory that can formally categorize the various kinds of commonsense knowledge is critical for enabling fundamental research in this area. In a recent book, Gordon and Hobbs described such a categorization, argued to be reasonably complete. However, the theory’s reliability has not been independently evaluated through human annotator judgments. This paper describes such an experimental study, whereby annotations were elicited across a subset of eight foundational categories proposed in the original Gordon-Hobbs theory. We avoid bias by eliciting annotations on 200 sentences from a commonsense benchmark dataset independently developed by an external organization. The results show that, while humans agree on relatively concrete categories like time and space, they disagree on more abstract concepts. The implications of these findings are briefly discussed.

Publisher

Experimental Results

Relationships

https://tw.rpi.edu/project/machine-common-sense-mcs-multi-modal-open-world-grounded-learning-and-inference-mowgli

Collections

Tetherless World Publications

Full item page