method: keba2023-04-01
Authors: LGS
Description: Our model offers the unique ability to perform both text detection and recognition simultaneously. Using a Transformer-based architecture and an Encoder that extracts both location and semantic information from individual characters, we represent each word as a sequence of learnable features. Our model then applies a simple head to each of the 96 independent characters to accurately identify and recognize the text.