method: Clova DEER2023-04-01

Authors: Song Kayeon, Taeho Kil, Donghyun Kim, Sukmin Seo

Affiliation: Naver Cloud

Description: Our model passes through a CNN and deformable transformer encoder to extract multi-scale visual features for images. Then, an independent segmentation head is utilized to extract words, lines, and paragraphs. Additionally, text recognition results are achieved through a deformable transformer decoder. Our model performs both layout detection and OCR simultaneously. In summary, our single model performs both layout detection (task 1) and OCR (task 2) simultaneously.

Ranking Table

Description Paper Source Code
Word
DateMethodPQFscorePrecisionRecallTightness
2023-04-01Clova DEER0.60700.76950.77910.76020.7889

Ranking Graphic