Description: 1. We train a OCR model to optimize blurred images and handwriting. we use LayoutReader to reorder bounding box.
2. We pretrain a discrete 2d-position embedding model with question generation and span mask, and finetune it to predict the start and end positions of certain questions.