NLPTEA 2016 Shared Task: Chinese Gramatical Error Diagnosis

Co-organizers:

Lung-Hao Lee, National Taiwan Normal University
Gaoqi Rao, Beijing Language and Culture University
Liang-Chih Yu , Yuan Ze University
Endong Xun,
Beijing Language and Culture University
Baolin Zhang, Beijing Language and Culture University

Li-Ping Chang, National Taiwn Normal University

Introduction:

This paper presents the NLP-TEA 2016 shared task for Chinese grammatical error diagnosis which seeks to identify grammatical error types and their range of occurrence within sentences written by learners of Chinese as foreign language. We describe the task definition, data prepa- ration, performance metrics, and evaluation results. Of the 15 teams registered for this shared task, 9 teams developed the system and submitted a total of 36 runs. We expected this evalua- tion campaign could lead to the development of more advanced NLP techniques for education- al applications, especially for Chinese error detection. All data sets with gold standards and scoring scripts are made publicly available to researchers.

Overview Paper:

Lung-Hao Lee, Gaoqi Rao, Liang-Chih Yu, Endong Xun, Baolin Zhang, and Li-Ping Chang (2016). Overview of the NLP-TEA 2016 Shared Task for Chinese Grammatical Error Diagnosis. Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA'16), Osaka, Japan, 12 December, 2016, pp. 40-48.

Data Release:

The data sets with gold standard annotation and the evaluation tool can be downloaded. Please comply the following aggrements.

Agreements

The undersigned party has been authorized to use NLPTEA 2016 CGED Datasets for research purposes. The undersigned party agrees to abide with the following conditions on the use of these data sets:

  1. The NLPTEA 2016 CGED Datasets can only be used in academic research and cannot be used in profit-generating or commercial activities.

  2. The undersigned party will not transfer all or any part of NLPTEA 2016 CGED Datasets to third party.

  3. The undersigned party will indicate the uses of NLPTEA 2016 CGED Datasets, and acknowlege in any papers or reporting results of academic research based on the NLPTEA 2016 CGED Datasets.

    Please cite the papers as references for using the datasets:

    Lung-Hao Lee, Gaoqi Rao, Liang-Chih Yu, Endong Xun, Baolin Zhang, and Li-Ping Chang (2016). Overview of the NLP-TEA 2016 Shared Task for Chinese Grammatical Error Diagnosis. Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications (NLPTEA'16), Osaka, Japan, 12 December, 2016, pp. 40-48.

  4. The undersigned party alone bears the legal responsibility for any possible infingement of copyrights or intellectual property rights that may arise in the process of using the NLPTEA 2016 CGED Datasets for profit-making