“SHINRA 2019-JP Task” is a shared-task of structuring information in Japanese Wikipedia. The SHINRA project is a project to construct a resource by collaborative contribution (RbCC) held in 2019.
SHINRA 2019 JP Task:
“SHINRA2019-JP task” is aiming to create structured information on Japanese Wikipedia. The structure framework is based on the name entity ontology “Extended Named Entity (ENE)”, which has around 200 categories. We have categorized the entire Japanese Wikipedia into ENE categories, and at SHINRA2019-JP, we picked up 35 categories and the task is to extract attribute-values of the attributes defined in each ENE category. The outputs of the participant systems are available for further research, such as the ensemble learning.
- Task Detail
- Community Information
- Committee Members
At the 2019-JP Task, We choose 35 ENE categories as a structuring target.
For each category,
- Provide HTML files and Plain text files that are removed an HTML tag from HTML file of all articles that classified into target categories.
- Some annotation samples created manually as training data for a supervised learning approach (about 150~900 files per category).
- Any other relational data.
We provide these datasets for participants to try structuring by machine annotation for HTML or Plain dataset.
Moreover, we define the terms below for participation.
- Participants can choose any categories to try.
- Participants can submit an incomplete result.
- Participants must submit a brief overview of their method and give a presentation of their method at the final-meeting (except for anonymous participant).
We separate target categories into JP-5 or JP-30 by its size and specification.
Categories: Person, City, Company, Airport, Chemical-Compound
Specification: Some categories have a too massive size of articles to provide together. Thus, we separate and packed for each category. These are the same categories as the 2018-JP Task.
“Location” categories: Bay, Continental_Region, Country, Domestic_Region, GPE_Other, Geological_Region_Other, Island, Lake, Location_Other, Mountain, Province, River, Sea, Spa
“Organization” categories: Cabinet, Company_Group, Ethnic_Group_Other, Family, Government, International_Organization, Military, Nationality, Nonprofit_Organization, Organization_Other, Political_Organization_Other, Political_Party, Show_Organization, Sports_Federation, Sports_League, Sports_Team
Specification: 30 (competitively) small categories included in the abstract category “Location” or “Organization.” We provide packed data for each abstract category.
Definition of Categories
Each category and each attribute is compliant to ENE Definition 8.0.0.
Data detail is here.
You can download entire data from here.
Data registration from here (expired).
Changes from SHINRA 2018-JP Task
We held the 2018-Jp Task as an attribute-value extraction task. However, if the target article has some written that same as attribute-value, it is difficult to estimate which written is the collect value at the extraction task.
Therefore, we changed the attribute-value extraction task to the attribute-value annotation task for the 2019-JP task.
Adding JP-30 categories
We added JP-30 for new targets explained above.
- 2019/04/19 16:00 ~ 18:00 @RIKEN AIP
- 2019/07/12 16:00 ~ 18:00 @RIKEN AIP
- Submission Deadline
- 2019/10/23 15:00 ~ 18:00 @RIKEN AIP
- Mailing list(Google groups)
- For provide announcements from committee．Anyone can join in.
- Slack work space
- For the place for interactive communications between committee and participants．Anyone can join in．
- Committee chair: Satoshi Sekine
- Committee (sorry for japanese):