SIGHAN 2013 Bake-off: Chinese Spelling Check Task

Co-organizers:

Shih-Hung Wu, Chaoyang University of Technology
Chao-Lin Liu, National Chengchi University
Lung-Hao Lee, National Taiwan (Normal) University

Introduction:

At SIGHAN Bake-off 2013, we organize the Chinese Spelling Check task that provides an evaluation platform for developing and implementing automatic Chinese spelling checkers. Two subtasks, i.e., error detection and error correction, are designed to evaluate complete function of a spelling checker. The first subtask focuses on the ability of error detection. Given a complete sentence, the checker should detect if there are errors in the input, and point out the error locations of incorrect characters. The second subtask aims at the quality of error correction. In addition to indicating the error locations, the checker should suggest the correct characters. The hope is that, through such evaluation campaigns, more advanced Chinese spelling check techniques will be emerged.

Overview Paper:

Shih-Hung Wu, Chao-Lin Liu, and Lung-Hao Lee (2013). Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013. Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing (SIGHAN'13), Nagoya, Japan, 14 October, 2013, pp. 35-42.

Data Release:

The data sets with gold standard annotation and the evaluation tool can be downloaded. Please comply the following aggrements.

Agreements

The undersigned party has been authorized to use SIGHAN 2013 CSC Datasets for research purposes. The undersigned party agrees to abide with the following conditions on the use of these data sets:

  1. The SIGHAN 2013 CSC Datasets can only be used in academic research and cannot be used in profit-generating or commercial activities.

  2. The undersigned party will not transfer all or any part of SIGHAN 2013 CSC Datasets to third party.

  3. The undersigned party will indicate the uses of SIGHAN 2013 CSC Datasets, and acknowlege in any papers or reporting results of academic research based on the SIGHAN 2013 CSC Datasets.

    Please cite the papers as references for using the datasets:

    [1] Shih-Hung Wu, Chao-Lin Liu, and Lung-Hao Lee (2013). Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013. Proceedings of the 7th SIGHAN Workshop on Chinese Language Processing (SIGHAN'13), Nagoya, Japan, 14 October, 2013, pp. 35-42.
    [2] Chao-Lin Liu, Min-Hua Lai, Kan-Wen Tien, Yi-Hsuan Chuang, Shih-Hung Wu, and Chia-Ying Lee. Visually and Phonologically similar characters in incorrect Chinese Words: analyses, Identification, and Applications. ACM Transactions on Asian Language Information Processing, 10(2), 10:1-39.

  4. The undersigned party alone bears the legal responsibility for any possible infingement of copyrights or intellectual property rights that may arise in the process of using the SIGHAN 2013 CSC Datasets for profit-making