Document Visual Question Answering

Minesh Mathew recieved his Btech degree in computer science from NIT Warangal, India in 2009. He joined International Institute of Information Technoloy, Hyderabad (IIIT-H) India in 2013 as a Masters student. Currently he is pursuing his Ph.D under the supervision of Prof. CV Jawahar. He is a recepient of the TCS PhD fellowship for Computer Science. He has primarily been working on the problem of text recognition for Indian scripts and Arabic in document images and scene images.

Rubèn P. Tito received his B.Sc. and M.Sc. degrees in 2016 and 2018, from the Universitat Autònoma de Barcelona, respectively. The same year he joined the Computer Vision Center (CVC) as an intern student where he is doing his Ph.D. in “Single Shot Text Retrieval” under the supervision of Dr. Marçal Rossinyol and Dr. Ernest Valveny. His main research interests include text recognition, word spotting, multi-modal embeddings as well as Visual Question Answering.

Manuel Carbonell is an Industrial PhD candidate at the Computer Vision Center in Barcelona in collaboration with the IT company omni:us, supervised by Dr. Mauricio Villegas, Dr. Alicia Fornés and Dr. Josep Lladós. He received his B.Sc. in Mathematics and Msc. in Data Science degrees at the Autonomous University of Barcelona in 2014 and 2016 respectively. His research focuses on neural models for information extraction and understanding in semi structured document images.

Lluís Gómez i Bigordà is a TECNIOspring Research Fellow (H2020 Marie SkÅ‚odowska-Curie actions of the European Union) at at the Computer Vision Center (CVC), Universitat Autònoma de Barcelona (UAB). He received his PhD in Computer Science from the Universitat Autònoma de Barcelona in 2016. As a member of the Robust Reading research team at the Computer Vision Centre, and of the document analysis community, he has contributed several papers to the field and has had the chance to collaborate with a variety of research groups and venues. He has collaborated with other prominent research groups in the organization of the ICDAR Robust Reading Competition in their 2013, 2015, and 2017 editions. He served as an area chair of the International Conference on Document Analysis and Recognition (ICDAR 2017), as a chair and organizer of the International Workshop on Camera Based Document Analysis and Recognition (CBDAR 2017) and the International Workshop on Robust Reading (IWRR 2018); as well as a member of the Program Committee of CBDAR 2015, IWRR 2014, IWRR 2016, and DAS 2018. In 2016 he co-organized a Tutorial on "Scene-Text Localization, Recognition, and Understanding" in the International Workshop on Document Analysis Systems (DAS 2016).

Marçal Rusiñol is an Associate Researcher at the Computer Vision Center within the Intelligent Reading Systems research group, being the PI of several competitive research and tech. transfer projects. In 2004 he joined the Computer Vision Center where he obtained the Ph.D. degree under the supervision of Dr. Josep Lladós in 2009. He has been a Teaching Assistant and an Adjunct Lecturer at the Computer Sciences Department of the Universitat Autònoma de Barcelona from 2005. He hold two postdoctoral Marie Curie fellowships at ITESOFT and at the L3i Lab in the Université de La Rochelle (France) respectively. He has co-authored over 70 publications in refereed journals and conferences and with 1000+ citations, has an H-index of 18. His main research interests include Computer Vision, Machine Learning, Data Science, Reading Systems, Information Retrieval, Digital Humanities and Performance Evaluation.

Josep Lladós is an Associate Professor at the Computer Sciences Department of the Universitat Autònoma de Barcelona and a staff researcher of the Computer Vision Center, where he is also the director since January 2009. He is chair holder of Knowledge Transfer of the UAB Research Park and Santander Bank. He is the head of the Pattern Recognition and Document Analysis Group (2009SGR-00418). His current research fields are document analysis, structural and syntactic pattern recognition and computer vision. He has been the head of a number of Computer Vision R+D projects and published more than 200 papers in national and international conferences and journals.

C. V. Jawahar is the Amazon Chair professor at IIIT Hyderabad, India. At IIIT Hyderabad, Jawahar leads a group focusing on computer vision, machine learning, document analysis and multimedia systems. He has been looking into a set of problems that overlap with vision, language and text. In the past. He has served as a chair for previous editions of ACCV, WACV, IJCAI, ICDAR and ICVGIP. Presently, he is an area editor of CVIU and an associate editor of IEEE PAMI. He is also a program co-chair for ACCV 2018.

Dimosthenis Karatzas is an associate professor at the Universitat Autònoma de Barcelona and associate director of the Computer Vision Centre (CVC) in Barcelona, Spain. At the CVC he leads the vision and language research line, working at the intersection of computer vision and text analysis. He has co-authored over 100 publications in refereed journals and conferences and has an H-index of 23. He was the recipient of the 2013 IAPR/ICDAR Young Investigator Award, and Google Faculty Research Award in 2017. D. Karatzas has served in various roles at major conferences in his field (ICDAR, DAS, CBDAR, ICPR, ICFHR), including co-chairing IWRR 2014/16/18 and CBDAR 2015/17. D. Karatzas is a lead organiser of the Robust Reading Competitions series. He is the chair of the Technical Committee 11 on Reading Systems of the Int. Association of Pattern Recognition. D. Karatzas has been a founding member and a member of the executive committee of the UK Chapter of the SPIE, while he is currently a member of the IAPR-Education Committee and member of the IEEE the IAPR. He is one of the founders of the Library Living Lab, an open participatory innovation space in a public library.

R. Manmatha joined the College of Information and Computer Sciences in 1997 as a Postdoctoral Research Associate, was a Research Assistant Professor from 1998 to 2006 and a a Research Associate Professor from 2006 to 2016 and an adjunct Professor since 2016. He co-founded SnapTell (a mobile image search company) in 2006 which was acquired by Amazon in 2009. Since 2013 he is a Principal Scientist with A9/Amazon.

Important Dates

ICDAR 2021 edition

10 November 2020: Release of new subset of questions for Task 1 (tentative)

31 March 2021: Deadline for Competition submissions

5 -10 September 2021: Results presentation at ICDAR


CVPR 2020 edition

19 March 2020 : Text Transcriptions for Train_v0.1 Documents available

16 March 2020: Training set  v0.1 available

20 April 2020: Test set available

15 May 2020 (23:59 PST): Submission of results

16-18 June 2020: CVPR workshop