Method: LCT_OCR （中国科学院信息工程研究所） - Task 2 - Text Line Recognition - ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard

method: LCT_OCR （中国科学院信息工程研究所）2019-04-30

Authors: Yujia Li , Guangzhi Zhou, Hongchao Gao （李郁佳，周广治，高红超）

Description: The architecture consists of three main components, namely encoder network, multi-perspective hierarchical attention network, and a transcription layer. We first use a basal convolutional neural network to extract multi-perspective visual representations of text imag. Then, we design a hierarchical attention network to obtain comprehensive text representations by fully capturing multiperspective visual representations. The network consists of three attention blocks. In each block, a local visual representation encoder module and a decoder module are designed equally as an ensemble. Finally, we concatenate the obtained fixed size sequence, which is the input of the transcription layer.
Other datasets: we used training and validation datasets from the ICDAR19 competition and other challenges (ArT, LSVT), and our self-generated datasets.

我们的结构由编码器网络、多视角层次注意网络和转录层三大部分组成。首先利用卷积神经网络提取图片文本的多视角视觉表示；然后，我们设计了一个分层注意网络，通过充分捕捉多视角的视觉表征来获取综合的文本表征。该网络由三个attention块组成。在每个块中,我们集成了局部可视化表示编码器模块和解码模块。最后，我们将得到的固定大小序列串联起来作为转录层的输入。
其他数据集：我们使用了ICDAR19 比赛其他挑战（ArT、 LSVT）的训练及验证数据集，和我们自我合成的数据集。