- Task 1 - E2E Complex Entity Linking - Method: Pre-trained model based fullpipe pair extraction (opti_v3, no inf_aug)
- Method info
- Samples list
- Per sample details
method: Pre-trained model based fullpipe pair extraction (opti_v3, no inf_aug)2023-03-16
Authors: Zening Lin, Teng Li, Wenhui Liao, Jiapeng Wang, Songxuan Lai, Lianwen Jin
Affiliation: South China University of Technology; Huawei Cloud
Description: Model
1. Take segment-level OCR as input, use XYCut & pre-trained-model-based-NER model to extract entities.
2. Use entity-level pre-trained-model-based RE model to extract pairs.
Details
1. All strings are converted to half-width before sending to the NER model.
2. Space generated by tokenizer is discarded using a string comparison algorithm in postprocessing step.
3. Box position jittering is applied when training the RE model.
4. For nested-key sorting, we use several rule based methods to determine the order.
5. XYCut algorithm is optimized to handle the order problem between lines inside an entity.
6. Add rules for keys with colon