Notes on SHINRA2020-ML Evaluation Report
This note is intended to help participants understand the SHINRA2020-ML Evaluation Report.
* The evaluation of SHINRA2020-ML is conducted to measure the performance of systems on multi-label classification, based on the test data, a portion of the target data.
* The performance is measured using the micro average F1 measure, i.e., the harmonic mean of micro-averaged precision and micro-averaged recall.
* Each page in the test data is expected to be classified into one or more of the ENE (ver.8.0) categories correctly. If no label is predicted for a page, the page is considered to be assigned the label ‘IGNORED'(ENE_id:9). See SHINRA2020-ML CFP (Task Description) for further details on evaluation.
The SHINRA2020-ML Evaluation Report consists of three parts:
- Part 1: Evaluation Results Overview
- Part 2: Evaluation Results by Target Language
- Part 3: Evaluation Results by Submission
The report is given as a Google Sheets. Each of the above parts is composed of one or more than one spreadsheets.
NOTICE:
- ‘Micro average F1’, ‘micro-averaged precision’, ‘micro-averaged recall’ are hereinafter referred to respectively as ‘F1’, ‘precision’, and ‘recall’.
Part1: Evaluation Results Overview
Part1 is composed of a spreadsheet.
Spreadsheet name: ’ALL F1’
Each line of the table corresponds to a submission*1.
Note
- *1Run results of each participant group are submitted by method. If more than one versions for a method is submitted, the final version is used for evaluation.
See SHINRA2020-ML: Results Submission for further details of submission.
Group ID | Method | Late Submission | ar*2 | bg*2 | … | zh*2 |
---|---|---|---|---|---|---|
LIAT | ML-BERT | 70.00 | 71.00 | … | 70.00 | |
LIAT | ML-BERT | Y*3 | 73.00 | 74.00 | … | 73.00 |
column | explanation | example |
---|---|---|
GroupID | The group ID used for the task registration on NTCIR-15 site. | LIAT |
Method | The ID to distinguish between the methods used for the runs of the participant group. | ML-BERT |
Late Submission | Late Submission (marked (‘Y’)*4 if applicable ) | Y*3 |
ar | F1 for Arabic (ar*2) in this run | 73.00 |
… | ||
zh | F1 for Chinese (zh*2) in this run | 73.00 |
*2: two-letter lowercase language codes specified in [ISO639-1].
*3: 2020/09/02 in the revised versions.
*4: Year/Month/Day (JST) in the revised versions.
Part2: Evaluation Results by Target Language
Part2 is composed of multiple spreadsheets. Each spreadsheet correspond to a target language.
Spreadsheet name: ’Lang:[language code*3]’
ex. Lang:en
Each line of the table corresponds to a submission, though the information is limited to a target language.
Note:
- *3: two-letter lowercase language codes specified in [ISO639-1].
Group ID | Method | Late Submission | Precision | Recall | F1 |
---|---|---|---|---|---|
LIAT | ML-BERT | 70.00 | 70.00 | 70.00 | |
LIAT | ML-BERT | Y | 75.00 | 75.00 | 75.00 |
column | explanation | example |
---|---|---|
GroupID | The group ID used for the task registration on NTCIR-15 site. | LIAT |
Method | The ID to distinguish between the methods used for the runs of the participant group. ex:’ML-BERT’ | ML-BERT |
Late Submission | Late submission (marked (‘Y’) if applicable) | Y |
Precision | Precision (for the target language in the submission) | 70.0 |
Recall | Recall (for the target language in the submission) | 70.0 |
F1 | F1 (for the target language in the submission) | 70.0 |
Part 3: Evaluation Results by Submission
Part3 is composed of multiple spreadsheets.
Each spreadsheet corresponds to a submission grouped by method.
Spreadsheet name
The spreadsheet name is given as either of the following depending on the submission type:
(a)Regular submission: ‘System:[Group ID]_[Method ID]‘
ex. System:LIAT_ML_BERT
(b)Late submission: ‘System:[Group ID]_[Method ID]_late_submission‘
ex.System:LIAT_ML_BERT_late_submission
- Group ID: The group ID used for the task registration on NTCIR-15 site.
ex. ‘LIAT’ - Method ID: The ID to distinguish between the methods used for the runs of the participant group.
ex:’ML-BERT’ .
Each line of the table corresponds to a target language in the submission.
ISO639-1 | Language | Precision | Recall | F1 |
---|---|---|---|---|
ar | Arabic | 70.00 | 70.00 | 70.00 |
bg | Bulgarian | 75.00 | 75.00 | 75.00 |
column | explanation | example |
---|---|---|
ISO639-1 | language code*4 of the target language. | ar |
Language | English name corresponding to the language code | Arabic |
Precision | Precision (for the language in the submission) | 70.00 |
Recall | Recall (for the language in the submission) | 70.00 |
F1 | F1 (for the language in the submission) | 70.00 |
*4: two-letter lowercase language codes specified in [ISO639-1].
References
[ISO639-1] ISO 639-1:2002, Codes for the representation of names of languages — Part 1: Alpha-2 code, https://www.iso.org/standard/22109.html.