method: IE2021-04-19

Authors: liming

Description: An effective methods for IE

Authors: Applica.ai research team

Affiliation: Applica.ai

Description: Following the same evaluation rules as others, the OCR mismatch errors are excluded in the submission.

Additionally, we have manually fixed annotation discrepancies in "total" entity in the test set.

Note:
1. We submitted the best solution out of 100 fine-tuned models
2. In this task there is an annotation discrepancy in "total" entity which caused unfair comparison between models (In train/test sets "total" entity was randomly prefixed by "RM"). Number of errors in the top solutions caused by this kind of annotation error:

Applica.ai Lambert 2.0 + Excluding OCR Errors + Fixing total entity = 0
LayoutLM 2.0 (single model) = 3 (example: 275)
Applica.ai Lambert 2.0 + Excluding OCR Mismatch = 8 (example: 77)
Tencent Youtu = 8 (example: 120)
VIE = 0
HIK_OCR_Exclude_ocr_mismatch = 0
LayoutLM + Excluding OCR Mismatch = 9 (example: 121)

Authors: Applica.ai Research Team

Affiliation: Applica.ai

Description: TILT neural network architecture which simultaneously learns layout information, visual features, and textual semantics. Contrary to previous approaches, we rely on a encoder-decoder architecture. We submitted results from a single model. Following the same evaluation procedure as others we removed OCR mismatches and fixed discrepancies of total amount randomly prefixed by "RM".

Ranking Table

Description Paper Source Code
DateMethodRecallPrecisionHmean
2021-04-19IE97.05%99.56%98.29%
2021-01-02Applica.ai Lambert 2.0 + Excluding OCR Errors + Fixing total entity96.83%99.56%98.17%
2021-02-16Applica.ai TILT + Excluding OCR Errors + Fixing total entity96.83%99.41%98.10%
2020-12-24LayoutLM 2.0 (single model)96.61%99.04%97.81%
2021-03-31Linklogis_BigData96.47%99.04%97.74%
2021-01-01Applica.ai Lambert 2.0 + Excluding OCR Mismatch96.40%99.11%97.74%
2020-12-07Tencent Youtu96.47%98.89%97.67%
2020-12-28IE method96.33%98.53%97.41%
2020-05-07HIK_OCR_Exclude_ocr_mismatch96.33%98.38%97.34%
2020-04-18LayoutLM + Excluding OCR Mismatch96.04%98.16%97.09%
2020-11-09admintest96.33%96.33%96.33%
2020-04-15PICK-PAPCIC & XZMU95.46%96.79%96.12%
2020-04-16LayoutLM96.04%96.04%96.04%
2020-03-26Applica.ai roberta-base-2D95.39%95.80%95.60%
2020-06-05great94.24%94.24%94.24%
2019-08-14PATech_AICenter94.02%94.02%94.02%
2021-02-21RoBERTa-base finetuned on business documents92.80%93.27%93.03%
2021-02-21RoBERTa-base92.22%92.55%92.39%
2020-05-23GIE91.21%93.43%92.31%
2020-07-07Taikang Insurance Group Research Institute91.79%91.99%91.89%
2019-08-05PATECH_CHENGDU_OCR_V291.21%91.21%91.21%
2020-02-20Character & Word BiLSTM Encoder90.85%90.85%90.85%
2019-05-05Ping An Property & Casualty Insurance Company90.49%90.49%90.49%
2019-04-29Enetity detection89.70%89.70%89.70%
2019-05-04H&H Lab89.63%89.63%89.63%
2019-05-02CLOVA OCR89.05%89.05%89.05%
2021-03-02layoutLM with ocr gt85.81%87.38%86.59%
2020-12-29coldog86.17%86.17%86.17%
2019-09-23ASTRI-CCT-MSA85.45%85.45%85.45%
2019-05-05GraphLayout85.09%85.09%85.09%
2021-03-02Qubitrics82.06%86.75%84.34%
2020-06-15End-to-end learning with PGN83.86%83.86%83.86%
2019-05-04HeReceipt-withoutRM83.00%83.24%83.12%
2020-06-17Graph Neural Net with Bert Embeddings82.78%82.78%82.78%
2019-05-06BOE_IOT_AIBD82.71%82.71%82.71%
2019-05-05PATECH_CHENGDU_OCR81.70%82.29%82.00%
2020-05-28SROIE LSTM - Axel Alejandro Ramos GarcĂ­a81.99%81.99%81.99%
2020-04-28BERT-MRC81.05%81.05%81.05%
2021-03-09Character Level BiLSTM79.25%79.25%79.25%
2020-05-29Cool Method Remix79.03%79.03%79.03%
2019-04-30NER with spaCy model78.96%79.02%78.99%
2020-12-28Custom Named Entity Recognition77.59%77.59%77.59%
2019-05-05CITlab Argus Information Extraction (positional & line features, enhanced gt)77.38%77.38%77.38%
2021-01-02lstm deep77.38%77.38%77.38%
2021-01-02lstm standard method trained 100 epochs constant learning rate76.15%76.15%76.15%
2019-04-28A Simple Method for Key Information Extraction as Character-wise Classification with LSTM75.58%75.58%75.58%
2019-04-30Bi-directional LSTM-CNNs-CRF (version2)74.86%74.86%74.86%
2019-05-05Location-aware BERT model for Text Information Extraction74.42%74.42%74.42%
2020-05-23test73.63%73.63%73.63%
2021-04-07Token level multi modal bilstm70.03%70.03%70.03%
2021-04-07Token level bert embed + bilstm66.43%66.43%66.43%
2019-04-30BERT with Multi-task Confidence Prediction66.14%66.14%66.14%
2021-04-08bert word embeding + gcn64.27%64.27%64.27%
2019-05-02With receipt framing63.04%63.54%63.29%
2019-05-05IFLYTEK-textNLP_v261.24%61.24%61.24%
2019-05-05SituTech_OCR59.01%62.38%60.64%
2019-04-30Key Information Extraction from Scanned Receipts28.75%36.31%32.09%

Ranking Graphic