Notes on SHINRA2020-ML Evaluation Report

This note is intended to help participants understand the SHINRA2020-ML Evaluation Report.

* The evaluation of SHINRA2020-ML is conducted to measure the performance of systems on multi-label classification, based on the test data, a portion of the target data.

* The performance is measured using the micro average F1 measure, i.e., the harmonic mean of micro-averaged precision and micro-averaged recall.

* Each page in the test data is expected to be classified into one or more of the ENE (ver.8.0) categories correctly. If no label is predicted for a page, the page is considered to be assigned the label ‘IGNORED'(ENE_id:9). See SHINRA2020-ML CFP (Task Description) for further details on evaluation.

The SHINRA2020-ML Evaluation Report consists of three parts:

Part 1: Evaluation Results Overview
Part 2: Evaluation Results by Target Language
Part 3: Evaluation Results by Submission

The report is given as a Google Sheets. Each of the above parts is composed of one or more than one spreadsheets.

NOTICE:

‘Micro average F1’, ‘micro-averaged precision’, ‘micro-averaged recall’ are hereinafter referred to respectively as ‘F1’, ‘precision’, and ‘recall’.

Part1: Evaluation Results Overview

Part1 is composed of a spreadsheet.

Spreadsheet name: ’ALL F1’

Each line of the table corresponds to a submission^*1.

Note

^*1Run results of each participant group are submitted by method. If more than one versions for a method is submitted, the final version is used for evaluation.
See SHINRA2020-ML: Results Submission for further details of submission.

**Sample**
Group ID	Method	Late Submission	ar^*2	bg^*2	…	zh^*2
LIAT	ML-BERT		70.00	71.00	…	70.00
LIAT	ML-BERT	Y^*3	73.00	74.00	…	73.00

**Format**
column	explanation	example
GroupID	The group ID used for the task registration on NTCIR-15 site.	LIAT
Method	The ID to distinguish between the methods used for the runs of the participant group.	ML-BERT
Late Submission	Late Submission (marked (‘Y’)^*4 if applicable )	Y^*3
ar	F1 for Arabic (ar^*2) in this run	73.00
…
zh	F1 for Chinese (zh^*2) in this run	73.00

^*2: two-letter lowercase language codes specified in [ISO639-1].
^*3: 2020/09/02 in the revised versions.
^*4: Year/Month/Day (JST) in the revised versions.

Part2: Evaluation Results by Target Language

Part2 is composed of multiple spreadsheets. Each spreadsheet correspond to a target language.

Spreadsheet name: ’Lang:[language code^*3]’
ex. Lang:en

Each line of the table corresponds to a submission, though the information is limited to a target language.

Note:

^*3: two-letter lowercase language codes specified in [ISO639-1].

**Sample**
Group ID	Method	Late Submission	Precision	Recall	F1
LIAT	ML-BERT		70.00	70.00	70.00
LIAT	ML-BERT	Y	75.00	75.00	75.00

**Format**
column	explanation	example
GroupID	The group ID used for the task registration on NTCIR-15 site.	LIAT
Method	The ID to distinguish between the methods used for the runs of the participant group. ex：’ML-BERT’	ML-BERT
Late Submission	Late submission (marked (‘Y’) if applicable)	Y
Precision	Precision (for the target language in the submission)	70.0
Recall	Recall (for the target language in the submission)	70.0
F1	F1 (for the target language in the submission)	70.0

Part 3: Evaluation Results by Submission

Part3 is composed of multiple spreadsheets.
Each spreadsheet corresponds to a submission grouped by method.

Spreadsheet name

The spreadsheet name is given as either of the following depending on the submission type:

(a)Regular submission: ‘System:[Group ID]_[Method ID]‘
ex. System:LIAT_ML_BERT

(b)Late submission: ‘System:[Group ID]_[Method ID]_late_submission‘
ex.System:LIAT_ML_BERT_late_submission

Group ID: The group ID used for the task registration on NTCIR-15 site.
ex. ‘LIAT’
Method ID: The ID to distinguish between the methods used for the runs of the participant group.
ex：’ML-BERT’ .

Each line of the table corresponds to a target language in the submission.

**Sample**
ISO639-1	Language	Precision	Recall	F1
ar	Arabic	70.00	70.00	70.00
bg	Bulgarian	75.00	75.00	75.00

**format**
column	explanation	example
ISO639-1	language code^*4 of the target language.	ar
Language	English name corresponding to the language code	Arabic
Precision	Precision (for the language in the submission)	70.00
Recall	Recall (for the language in the submission)	70.00
F1	F1 (for the language in the submission)	70.00

^*4: two-letter lowercase language codes specified in [ISO639-1].

References

[ISO639-1] ISO 639-1:2002, Codes for the representation of names of languages — Part 1: Alpha-2 code, https://www.iso.org/standard/22109.html.