From Cells to Sentences: An End-to-End Framework for Table Understanding
Abstract
Real-world tables are messy: column headers are inconsistent, cells contain errors or missing values, and crucial information is scattered across multiple tables and documents. These issues cause even state-of-the-art language models to fail at seemingly simple questions. We present a robust framework for table understanding that explicitly handles these challenges through three coordinated mechanisms: structure-aware encoders that learn invariance to common corruptions, trainable slots that compress evidence to a fixed-size representation, and grounding modules that align each slot to supporting text passages. Unlike prior work that treats tables as flat text or relies on clean schemas, our approach maintains strong performance even under schema corruption and structural perturbations. Across eight benchmarks spanning question answering, fact verification, and text generation, we achieve the best performance among methods without external tools on five tasks and remain competitive with systems using much larger models or SQL executors. Under schema corruption and row/column permutations, our method degrades by less than 1.5 points while baselines drop 6-22 points, confirming that explicit denoising and grounding are essential for robust table understanding.