Authors: Liu Rong, Xu Chengshen, Huang Xiao, Li lin
Description: In Task One, we have used the end-to-end detecting algorithm Yolov3 to accurately localizing texts with several rectangular regions. To sort those regions according to the requirements, some details are presented that when the IOUs of the vertical line segment of rectangular regions are larger than 0.7, those regions are treated as in the same vertical heights.
In this task, we utilize the end-to-end recognizing algorithm CRNN to recognizing the images of the obtained text regions. To gain more training data, we use rotation, shearing, ZCA and so on to expanding the number of images to about 10 times. If the width/height ratio of an image is larger than 12, we enlarge the width of the image to 480. If the width/height ratio of an image is smaller than 12, we enlarge the height of the image to 40. After that we extend the image to the same size 40*480 with zero paddings.
The CRNN network consists of eight convolutional layers, two LSTM layers and a fully connected layer. Each convolutional layer has a batch normalization and a pooling operation followed.