Authors: Asaf Gendler, Sharon Fogel, Inbal Lavi, Yair Kittenplon, Alona Golts, Shahar Tsiper
Affiliation: AWS AI Labs
Description: We utilize a Deformable Detr architecture, and train it to detect objects that belong to one of three possible classes: word, line and paragraph. Since Deformable Detr only predicts bounding boxes while HierText is evaluated as an instance segmentation task, we additionally incorporate a segmentation head which is designed to predict a local segmentation mask within each predicted bounding box. We train the model on HierText training set only. We associate the words and lines to their respective lines and paragraphs by using a geometric intersection condition.