Effective Data Distillation for Tabular Datasets

No Thumbnail Available
Authors
Kang, Inwon
Ram, Parikshit
Zhou, Yi
Samulowitz, Horst
Seneviratne, Oshani
Issue Date
2024-02-24
Type
Language
Keywords
Research Projects
Organizational Units
Journal Issue
Alternative Title
Abstract
Data distillation is a technique of reducing a large dataset into a smaller dataset. The smaller dataset can then be used to train a model which can perform comparably to a model trained on the full dataset. Past works have examined this approach for image datasets, focusing on neural networks as target models. However, tabular datasets pose new challenges not seen in images. A sample in tabular dataset is a one dimensional vector unlike the two (or three) dimensional pixel grid of images, and Non-NN models such as XGBoost can often outperform neural network (NN) based models. Our contribution in this work is two-fold: 1) We show in our work that data distillation methods from images do not translate directly to tabular data; 2) We propose a new distillation method that consistently outperforms the baseline for multiple different models, including non-NN models such as XGBoost.
Description
Full Citation
Kang, Inwon, et al. "Effective Data Distillation for Tabular Datasets (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 38. No. 21. 2024.
Publisher
AAAI
Journal
Volume
Issue
PubMed ID
DOI
ISSN
EISSN