Authors: Yuliang Liu, Canjie Luo, Lianwen Jin, Sheng Zhang, Zhaohai Li, Lele Xie, Zenghui Sun
Description: Two models have been trained separately: one model for text detection and another for classifying scripts. The two models are jointed to output the final results. After generating the detection results, a classification model with 8 classes (including background) is used to discard the detected boxes classified as background with very high confidence. Then a 7-class model was utilized to yield the final results. Since Chinese and Japanese are found to be easily confused by the model, and since only few images simultaneously contain both Chinese and Japanese scripts, a statistical average method is used to modify the Chinese and Japanese classification results.