method: TH-DL2017-07-01
Authors: Jiaming Guo, Guangxiang Bin, Liangrui Peng
Description: A deep CNN similar to GoogLeNet is used with a smaller number of layers of inception structures for computation efficiency. For image pre-processing, the shorter edge is resized to 224 while preserving the aspect ratio of the original image. Average pooling is used to transform the spatial dimension of the feature map into a fixed size before the final fully connected layer. In the training process, the batch size for each iteration is set to 1, the mean of gradients for a preset size of iterations (e.g. 32) are calculated and used to update the network weights.
Confusion Matrix
Detection | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Arabic | Latin | Chinese | Japanese | Korean | Bangla | Symbols | Mixed | None | ||
GT | Arabic | 4458 | 529 | 7 | 41 | 34 | 41 | 32 | 0 | 0 |
Latin | 915 | 56807 | 105 | 873 | 918 | 507 | 412 | 0 | 0 | |
Chinese | 74 | 674 | 2454 | 1281 | 210 | 50 | 7 | 0 | 0 | |
Japanese | 251 | 2791 | 497 | 3958 | 469 | 146 | 45 | 0 | 0 | |
Korean | 518 | 4036 | 152 | 704 | 7216 | 331 | 35 | 0 | 0 | |
Bangla | 23 | 281 | 10 | 53 | 45 | 2132 | 1 | 0 | 0 | |
Symbols | 99 | 1431 | 5 | 128 | 22 | 35 | 1776 | 0 | 0 | |
Mixed | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |
None | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |