Overview - ICDAR 2017 Challenge on Text Extraction from Biomedical Literature Figures
Figures are ubiquitous in biomedical literature, and they represent important biomedical knowledge. The sheer volume of biomedical publications has made it necessary to develop computational approaches for accessing figures. Consequently, during the last few years, figure classification, retrieval and mining have garnered significant attention in the biomedical research communities. Since text frequently appears in figures, semantic analysis of such text may assist the task of mining information from figures. Little research, however, has specifically explored automated text extraction from biomedical figures and their semantic analysis.
Unlike images in the open domain, biomedical figures present unique challenges. For example, biomedical figures typically have complex layout, small font size, short text, specific text, complex symbols and irregular text arrangement. The quality of figures vary depending on different publishers. Consequently, conventional OCR technologies and systems which are typically trained on open domain images do not work well on biomedical figures. To better leverage biomedical figures in research and analysis in the future as well as making them more searchable and computable, we propose Semantic Interpretation of Biomedical Figure Mining to address various challenges related to semantic biomedical figure mining.
Semantic Interpretation of Biomedical Figure Mining Challenge is being conducted to assess the capability of text detection, recognition, mining and even NLP algorithms to correctly detect and recognize text appearing in biomedical literature figures. This ICDAR2017 Competition focuses on extracting (detecting and recognizing) text from biomedical literature figures (ICDAR2017 DeTEXT Competition).
Challenge News
- 04/01/2017
DeTEXT: Training datasets available
Important Dates
-
April, 1: Datasets available.
-
June, 10: Testing set release.
-
June, 30: Submission of results deadline.
-
November, 10-15: Results presentation.