Authors: Yanbing Dong ( from MetaSota.ai)
Description: In the annotation of training dataset, the polygon of the text instance is also provided and regular text in- stances(horizontal or vertical) are much easier to recognize, so we transform all text instances in the training datasets to horizontal or vertical ones, then train the dataset using CRNN with CTC loss. we train deeplab v3 in the training images to get the mask, then do template matching in the training images to get the approximate polygon of the text instance, then get the horizontal or vertical image to do recognition.
Shi, B., Bai, X., & Yao, C. (2017). An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2298-2304.
Chen, L., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking Atrous Convolution for Semantic Image Segmentation. CoRR, abs/1706.05587.