method: YOLOv8X+Grid2023-05-24
Authors: Jakub Straka
Affiliation: University of West Bohemia, Department of Cybernetics
Description: LIR task may be solved in many different ways. We chose to approach this task as object detection. This means that we treated each field in the document as an object. For grouping of line items, we added class line_item that indicates all fields that should be grouped together. As the detection model was used YOLOv8X. The model is based on the convolutional neural network. One of the advantages of this model is its speed and small size. We also incorporated methods used in [1].
1. Anoop Raveendra Katti, Christian Reisswig, Cordula Guder, Sebastian Brarda,
Steffen Bickel, Johannes Höhne, and Jean Baptiste Faddoul. Chargrid: Towards
understanding 2d documents. arXiv preprint arXiv:1809.08799, 2018