Notes on SHINRA2020-ML Evaluation Report

Notes on SHINRA2020-ML Evaluation Report

This note is intended to help participants understand the SHINRA2020-ML Evaluation Report.

* The evaluation of SHINRA2020-ML is conducted to measure the performance of systems on multi-label classification, based on the test data, a portion of the target data.

* The performance is measured using the micro average F1 measure, i.e., the harmonic mean of micro-averaged precision and micro-averaged recall.

* Each page in the test data is expected to be classified into one or more of the ENE (ver.8.0) categories correctly. If no label is predicted for a page, the page is considered to be assigned the label ‘IGNORED'(ENE_id:9). See SHINRA2020-ML CFP (Task Description) for further details on evaluation.

The SHINRA2020-ML Evaluation Report consists of three parts:

  • Part 1: Evaluation Results Overview
  • Part 2: Evaluation Results by Target Language
  • Part 3: Evaluation Results by Submission

The report is given as a Google Sheets. Each of the above parts is composed of one or more than one spreadsheets.

NOTICE:

  • ‘Micro average F1’, ‘micro-averaged precision’, ‘micro-averaged recall’ are hereinafter referred to respectively as ‘F1’, ‘precision’, and ‘recall’.

 

Part1: Evaluation Results Overview

Part1 is composed of a spreadsheet.

Spreadsheet name: ’ALL F1’

Each line of the table corresponds to a submission*1.

Note

  • *1Run results of each participant group are submitted by method. If more than one versions for a method is submitted, the final version is used for evaluation.
    See SHINRA2020-ML: Results Submission for further details of submission.
Sample
Group ID Method Late Submission ar*2 bg*2 zh*2
LIAT ML-BERT 70.00 71.00 70.00
LIAT ML-BERT Y*3 73.00 74.00 73.00
Format
column explanation example
GroupID The group ID used for the task registration on NTCIR-15 site. LIAT
Method The ID to distinguish between the methods used for the runs of the participant group. ML-BERT
Late Submission Late Submission (marked (‘Y’)*4 if applicable ) Y*3
ar F1 for Arabic (ar*2) in this run 73.00
zh F1 for Chinese (zh*2) in this run 73.00

*2: two-letter lowercase language codes specified in [ISO639-1].
*3: 2020/09/02 in the revised versions.
*4: Year/Month/Day (JST) in the revised versions.

Part2: Evaluation Results by Target Language

Part2 is composed of multiple spreadsheets. Each spreadsheet correspond to a target language.

Spreadsheet name: ’Lang:[language code*3]’
ex. Lang:en

Each line of the table corresponds to a submission, though the information is limited to a target language.

Note:

  • *3: two-letter lowercase language codes specified in [ISO639-1].
Sample
Group ID Method Late Submission Precision Recall F1
LIAT ML-BERT 70.00 70.00 70.00
LIAT ML-BERT Y 75.00 75.00 75.00
Format
column explanation example
GroupID The group ID used for the task registration on NTCIR-15 site. LIAT
Method The ID to distinguish between the methods used for the runs of the participant group. ex:’ML-BERT’ ML-BERT
Late Submission Late submission (marked (‘Y’) if  applicable) Y
Precision Precision (for the target language in the submission) 70.0
Recall Recall (for the target language in the submission) 70.0
F1 F1 (for the target language in the submission) 70.0

Part 3: Evaluation Results by Submission

Part3 is composed of multiple spreadsheets.
Each spreadsheet corresponds to a submission grouped by method.

Spreadsheet name

The spreadsheet name is given as either of the following depending on the submission type:

(a)Regular submission: ‘System:[Group ID]_[Method ID]
ex. System:LIAT_ML_BERT

(b)Late submission: ‘System:[Group ID]_[Method ID]_late_submission
ex.System:LIAT_ML_BERT_late_submission

  • Group ID: The group ID used for the task registration on NTCIR-15 site.
    ex. ‘LIAT’
  • Method ID: The ID to distinguish between the methods used for the runs of the participant group.
    ex:’ML-BERT’ .

Each line of the table corresponds to a target language in the submission.

Sample
ISO639-1 Language Precision Recall F1
ar Arabic 70.00 70.00 70.00
bg Bulgarian 75.00 75.00 75.00
format
column explanation example
ISO639-1 language code*4 of the target language. ar
Language English name corresponding to the language code Arabic
Precision Precision (for the language in the submission) 70.00
Recall Recall (for the language in the submission) 70.00
F1 F1 (for the language in the submission) 70.00

*4: two-letter lowercase language codes specified in [ISO639-1].

References

[ISO639-1] ISO 639-1:2002, Codes for the representation of names of languages — Part 1: Alpha-2 code, https://www.iso.org/standard/22109.html.