Authors: Sehwan Joo, Wonho Song
Affiliation: Upstage AI
Description: We use the DB  architecture with the decoder changed to SegFormer and Unet. In addition, we use SwinTransformer and EfficientNet. Finally, we ensemble these models.
M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai. Real-time scene text detection with differentiable binarization. In AAAI Conf. on Artificial Intelligence, pages 11474–11481, 2020.
Zhiqi Li, Wenhai Wang, Enze Xie, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, Tong Lu, and Ping Luo. Panoptic segformer. arXiv preprint arXiv:2109.03814, 2021.