method: TWA2022-03-15
Authors: Zan-Xia Jin, Mike Zheng Shou, Fang Zhou, Satoshi Tsutsui, Jingyan Qin, Xu-Cheng Yin
Affiliation: University of Science and Technology Beijingļ¼ National University of Singapore
Description: We propose an OCR Token-Word Contrastive (TWC) learning task, which pre-trains word representation by augmenting OCR tokens via the Levenshtein distance between the OCR tokens and words in a dictionary.