SHINRA2020-ML: FAQ

References
- Is there any explanatory material on SHINRA2020-ML other than the information on your website that would be useful？
Account
- Can I download the SHINRA2020-ML task data using my leaderboard account?
Schedule
- The Participant paper due is stated as ’Oct 10, 2020’ in the email from task organizers, while NTCIR-15 site(Important Dates) says it is on ‘Sep 20, 2020’. Which is correct?
Target Languages
- Do I have to classify all the 30 languages to be a participant?
Training Data
- How should I handle the labels from different records sharing the same page ids?
Submission
- Can I submit the results after the deadline?
- Some portion of the target data is overlapped with the training data. Should we submit the run results for the entire target data?
Evaluation
- Do the official SHINRA2020-ML evaluation and SHINRA2020-ML leaderboard evaluation mean the same thing?
Leaderboard

References

Is there any explanatory material on SHINRA2020-ML other than the information on your website that would be useful？

The slide and video presented at the SHINRA2020 interim report meeting (31 July, 2020) are available below:
[slide]
[video] (language: Japanese)

Account

Can I download the SHINRA2020-ML task data using my leaderboard account?

To download the SHINRA2020-ML task data, you need to create a SHINRA account apart from the leaderboard account and sign in using the SHINRA account. As for the leaderboard account, please check FAQ on Leaderboard.

Schedule

The Participant paper due is stated as ’Oct 10, 2020’ in the email from task organizers, while NTCIR-15 site(Important Dates) says it is on ‘Sep 20, 2020’. Which is correct?

’Oct 10, 2020’ is correct. The SHINRA2020-ML Participant paper due has been extended along with the postponement of the SHINRA2020-ML Registration & Result submission deadline.

Target Languages

Do I have to classify all the 30 languages to be a participant?

No. You are expected to select one or more target languages. See CFP for further details.

Training Data

How should I handle the labels from different records sharing the same page ids?

There are cases where multiple records in the training data share the same pageid, which means that the page is linked from multiple Japanese pages.

{"pageid": 57330, "title": "Circulatory system", "ja_pageid": 108191, "ja_title": "循環器", "_stamp": "AUTO.TOHOKU.201906", "ENEs": [{"ENE_id": "0", "ENE_name": "Concept"}]}
{"pageid": 57330, "title": "Circulatory system", "ja_pageid": 569307, "ja_title": "循環系", "_stamp": "HAND.AIP.201910", "ENEs": [{"ENE_id": "1.10.5.1", "ENE_name": "Animal_Part"}]}

In such cases, please get the ENE_ids of the page from all the records with the same pageid in the training data.
If you would like to see the relevant data to solve the problem, please refer to the following posts on SHINRA2020-ML slack.
https://shinra2020-ml.slack.com/archives/CQ3RLNQ0N/p1596700262035600
https://shinra2020-ml.slack.com/archives/CQ3RLNQ0N/p1596700306035700
We apologize for the inconvenience.

Submission

Can I submit the results after the deadline?

The official results should be submitted by Aug 31 (Timezone: Baker Island(USA), UTC-12).
If you miss the deadline, you can still submit your results. Please notice the followings.

The results submitted after the deadline are treated as unofficial. You will get unofficial scores instead of the official evaluation results.
We will publish the unofficial evaluation results, which will be clearly distinguished from the official ones.
The results will be used for building the knowledge base.

Some portion of the target data is overlapped with the training data. Should we submit the run results for the entire target data?

Yes. Please submit the outputs for the entire target data, though the portion of the target data corresponding to the training data is not used for the evaluation. See SHINRA2020-ML: Results Submission for further details on submission.

Evaluation

Do the official SHINRA2020-ML evaluation and SHINRA2020-ML leaderboard evaluation mean the same thing?

No. The official evaluation of SHINRA2020-ML task is independent of the evaluation of the leaderboard.
The former is based on the entire target data, as described in the SHINRA2020-ML CFP (Task Description), while the latter is based on a portion of the SHINRA2020-ML target data as described in the leaderboard site.

Leaderboard

I have registered for the SHINRA2020-ML task through the NTCIR-15 site. Can I submit for the SHINRA2020-ML leaderboard? (leaderboard account)

Please sign up for a RIKEN-AIP-NLP Projects Leaderboard account to submit for the leaderboard.

The RIKEN-AIP-NLP Projects leaderboard account is an account to access any of the RIKEN-AIP-NLP Projects Leaderboards including SHINRA2020-ML leaderboard.
The RIKEN-AIP-NLP Projects leaderboard accounts are associated with the slack accounts for the corresponding tasks (ex. SHINRA-2020 ML slack account). See Signup for details.
The maximum number of the RIKEN-AIP-NLP Project leaderboard accounts that can be assigned to any of the slack accounts is 1.

For information about creating the leaderboard account, please refer to Signup: SHINRA2020-ML.

Are the evaluation results on the SHINRA2020-ML task independent of that of the leaderboard?

Yes.

Which metric is used for the leaderboard evaluation?

The micro average F1 measure is used.　See Task description in the SHINRA2020-ML CFP for further details.

Does 2020-ML leaderboard have a limit to the number of submission?

Yes. The total number of submission across all the target languages by a group is limited to a maximum of 5 per day. The submission counter is reset to zero at 0:00 AM (JST).
See 2020-ML leaderboard: Important dates and Rules for further details.

Are you planning to raise the upper limit of leader submission?

We have no plan at this time. We would like to reconsider the matter.

What is the proper submission format?

Please refer to SHINRA2020-ML: Data Format for the submission format of json file.
The json files should be compressed in a zip file. See 2020-ML leaderboard: Specification for further details.
【NOTICE】The json files contained in the example submission are in the old format. We will revise them soon.