method: YOLOv8X+Grid2023-05-08

Authors: Jakub Straka

Affiliation: University of West Bohemia, Department of Cybernetics

Description: KILE task may be solved in many different ways. We chose to approach this task as object detection. This means that we treated each field in the document as an object. As the detection model was used YOLOv8X. The model is based on the convolutional neural network. One of the advantages of this model is its speed and small size. We also incorporated methods used in [1].

1. Anoop Raveendra Katti, Christian Reisswig, Cordula Guder, Sebastian Brarda,
Steffen Bickel, Johannes Höhne, and Jean Baptiste Faddoul. Chargrid: Towards
understanding 2d documents. arXiv preprint arXiv:1809.08799, 2018