method: TH-DL2017-07-01

Authors: Jiaming Guo, Guangxiang Bin, Liangrui Peng

Description: A deep CNN similar to GoogLeNet is used with a smaller number of layers of inception structures for computation efficiency. For image pre-processing, the shorter edge is resized to 224 while preserving the aspect ratio of the original image. Average pooling is used to transform the spatial dimension of the feature map into a fixed size before the final fully connected layer. In the training process, the batch size for each iteration is set to 1, the mean of gradients for a preset size of iterations (e.g. 32) are calculated and used to update the network weights.

Confusion Matrix

Detection
ArabicLatinChineseJapaneseKoreanBanglaSymbolsMixedNone
GTArabic445852974134413200
Latin9155680710587391850741200
Chinese746742454128121050700
Japanese251279149739584691464500
Korean518403615270472163313500
Bangla232811053452132100
Symbols99143151282235177600
Mixed000000000
None000000000