Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems

Authors
Huang, Lifu
May, Jonathan
Pan, Xiaoman
Ji, Heng
Ren, Xiang
Han, Jiawei
Zhao, Lin
Hendler, James A.
ORCID
No Thumbnail Available
Other Contributors
Issue Date
2017
Keywords
Degree
Terms of Use
Attribution-NonCommercial-NoDerivs 3.0 United States
Full Citation
Lifu Huang, Jonathan May, Xiaoman Pan, Heng Ji, Xiang Ren, Jiawei Han, Lin Zhao, and James A. Hendler. Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems. Big Data.Mar 2017.19-31.http://doi.org/10.1089/big.2017.0012
Abstract
The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.
Description
Department
Publisher
Mary Ann Liebert, Inc.
Relationships
Access