ICFHR 2020 Competition on Offline Recognition and Spotting of Handwritten Mathematical Expressions

Acronym: OffRaSHME (Offline Recognition and Spotting of Handwritten Mathematical Expressions)

Introduction

Handwritten Mathematical Expression Recognition (HMER) has wide potential applications in many areas such as education, office automation and conference systems, and hence is steadily capturing increasing attentions from the community in recent years. HMER is also an important sub-problem in the research of document analysis and recognition. The challenges of HMER mainly lies in the complicated two-dimensional (2D) structures and spatial relations contained in Handwritten Mathematical Expressions (HMEs). There are still a lot of technical problems in HMER that requires further research.

The six previous Competitions on Recognition of HMEs (CROHME) had been organized at ICDAR 2011, 2013 & 2019, and ICFHR 2012, 2014 & 2016 to push forward the research of HMER, and achieved significant success. However, we noticed that, previous CROHME competitions except the one held at ICDAR 2019 only released online data of HMEs (i.e., stroke data with traces) and focus on the online HMER problem. In ICDAR 2019 CROHME, the organizers expanded the set of inputs to include both  online  and offline handwritten formulas. However, the offline HMEs are generated using online traces and may be different from real HMEs written on papers/documents. In the real scenarios, offline HMER may face the problems of blurring, noises, lacking of strokes, complex background, etc., and hence is much more challenging and requires further investigation.

On the other hand, in the application of computer-assisted scoring that has potential significance in the education area, the technique of handwritten expression spotting can be used to perform automated scoring by retrieving the input formula (standard answer) from the test examination papers. The base problem of this task is the handwritten formula matching problem, which essentially is a structural pattern matching problem. In addition to computer-assisted scoring, the retrieval of input formulas on the web or database is also a problem worth studying.These kinds of tasks have not been conducted in previous CROHME competitions.

To this end, we will collect a dataset of offline handwritten mathematical expressions (HMEs) by scanning papers from real scenarios that contain expressions, complex layouts (e.g., the examination papers), and maybe noises. This kind of offline HMEs can be much more challenging and open a large room for research and improvement. We will also ask people to write expressions that have the same classes with the HMEs used in ICDAR 2019 CROHME competition on papers, which can support better comparison research.

To support the research on the task of handwritten mathematical expression spotting, we will collect a set of documents that contain expressions and label the locations of expressions each of which are labeled at symbol level. This dataset can also facilitate the research of expression detection. These kinds of datasets are new to the ICFHR and ICDAR communities.

By doing this, the newly collected datasets have some novel features and can be an extension to previous datasets. We will summarize the novel additions as follows.

(1)  The newly collected dataset will be the first dataset of offline HMEs collected by writing on documents of different materials or scanning from documents that contain expressions. The competition on it will be very interesting.

(2) Moreover, we will label the offline dataset at symbol level. Unlike the offline HMEs used in ICDAR 2019 CROHME competition, which are generated from online ones and can be easily annotated at symbol level, directly annotating offline HMEs without stroke information will be non-trivial.

(3) Using the new dataset, besides the offline HMER task, two new tasks will be organized. One is the recognition of scene HMEs that are captured from real scenarios. Another new task is the expression spotting (ES) problem that inherently is a structural pattern matching problem. The aim of the ES problem is to search a recognized/input expression from the images, which can be applied to the education area for computer-assisted scoring and formula retrieval.

In this competition, we try to evaluate and answer what performance can be achieved by the state-of-the-art approaches on the two new tasks. We believe that the competition will open new perspectives while being complementary to existing work.

We expect large interest and participation from both industry and researchers for this competition. The many participants in OffRaSHME would be also interested in this extension of the competition from the ICFHR and ICDAR communities, and also the large computer vision community.