method: Clova DEER2023-04-01
Authors: Song Kayeon, Taeho Kil, Donghyun Kim, Sukmin Seo
Affiliation: Naver Cloud
Description: Our model passes through a CNN and deformable transformer encoder to extract multi-scale visual features for images. Then, an independent segmentation head is utilized to extract words, lines, and paragraphs. Additionally, text recognition results are achieved through a deformable transformer decoder. Our model performs both layout detection and OCR simultaneously. In summary, our single model performs both layout detection (task 1) and OCR (task 2) simultaneously.
Description Paper Source Code
Word | |||||||||
---|---|---|---|---|---|---|---|---|---|
Date | Method | PQ | Fscore | Precision | Recall | Tightness | |||
2023-04-01 | Clova DEER | 0.6070 | 0.7695 | 0.7791 | 0.7602 | 0.7889 |