method: Baseline+Ensemble+Pseudo+Post-Processing2023-05-16

Authors: UIT@AICLUB_TAB

Affiliation: UIT - University of Information Technology - VNUHCM

Email: 22520121@gm.uit.edu.vn

Description: Our approach is based on the checkpoint baseline with some improvements. We trained/used models:
1. Model RoBERTa base from scratch using FGM and Lion Optimizer with synthetic data for 30 epochs, after that, I trained on annotated data.
2. Model RoBERTa ours (checkpoint) with Lion Optimizer
3. Model RoBERTa base (checkpoint)

After that, we ensemble them by unioning words that are marked at 1 of 55 field type, post-processing.
After that, we used the ensembled model to predict unlabeled data, we have pseudo data, use them to pre-train 3 models, and train on annotated data after that.

Pipeline: https://ibb.co/4MWcXgb