Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems

No Thumbnail Available
Authors
Huang, Lifu
May, Jonathan
Pan, Xiaoman
Ji, Heng
Ren, Xiang
Han, Jiawei
Zhao, Lin
Hendler, James A.
Issue Date
2017
Type
Article
Language
Keywords
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
The ability of automatically recognizing and typing entities in natural language without prior knowledge (e.g., predefined entity types) is a major challenge in processing such data. Most existing entity typing systems are limited to certain domains, genres, and languages. In this article, we propose a novel unsupervised entity-typing framework by combining symbolic and distributional semantics. We start from learning three types of representations for each entity mention: general semantic representation, specific context representation, and knowledge representation based on knowledge bases. Then we develop a novel joint hierarchical clustering and linking algorithm to type all mentions using these representations. This framework does not rely on any annotated data, predefined typing schema, or handcrafted features; therefore, it can be quickly adapted to a new domain, genre, and/or language. Experiments on genres (news and discussion forum) show comparable performance with state-of-the-art supervised typing systems trained from a large amount of labeled data. Results on various languages (English, Chinese, Japanese, Hausa, and Yoruba) and domains (general and biomedical) demonstrate the portability of our framework.
Description
Full Citation
Lifu Huang, Jonathan May, Xiaoman Pan, Heng Ji, Xiang Ren, Jiawei Han, Lin Zhao, and James A. Hendler. Liberal Entity Extraction: Rapid Construction of Fine-Grained Entity Typing Systems. Big Data.Mar 2017.19-31.http://doi.org/10.1089/big.2017.0012
Publisher
Mary Ann Liebert, Inc.
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN