Authors: Wenhao He, Chang Zhong
Description: First train a CNN network to judge whether a patch of an image contain text. Then use sliding window and the trained CNN to generate a confidence map. Third, use this confidence map to get proposal region. Finally, use the popular MSER structure to localize text in the proposal region.