SHINRA2019-JP: Structuring Task

SHINRA 2019-JP Task” is a shared-task of structuring information in Japanese Wikipedia. The SHINRA project is a project to construct a resource by collaborative contribution (RbCC) held in 2019.

SHINRA 2019 JP Task:

“SHINRA2019-JP task” is aiming to create structured information on Japanese Wikipedia. The structure framework is based on the name entity ontology “Extended Named Entity (ENE)”, which has around 200 categories. We have categorized the entire Japanese Wikipedia into ENE categories, and at SHINRA2019-JP, we picked up 35 categories and the task is to extract attribute-values of the attributes defined in each ENE category. The outputs of the participant systems are available for further research, such as the ensemble learning.

Task Information

    Task Detail

    At the 2019-JP Task, We choose 35 ENE categories as a structuring target.

    For each category, 

    • Provide HTML files and Plain text files that are removed an HTML tag from HTML file of all articles that classified into target categories.
    • Some annotation samples created manually as training data for a supervised learning approach (about 150~900 files per category).
    • Any other relational data.

    We provide these datasets for participants to try structuring by machine annotation for HTML or Plain dataset.

    Moreover, we define the terms below for participation.

    • Participants can choose any categories to try.
    • Participants can submit an incomplete result.
    • Participants must submit a brief overview of their method and give a presentation of their method at the final-meeting (except for anonymous participant).

    Target Category

    We separate target categories into JP-5 or JP-30 by its size and specification.


    Categories: Person, City, Company, Airport, Chemical-Compound

    Specification: Some categories have a too massive size of articles to provide together. Thus, we separate and packed for each category. These are the same categories as the 2018-JP Task.


    “Location” categories: Bay, Continental_Region, Country, Domestic_Region, GPE_Other, Geological_Region_Other, Island, Lake, Location_Other, Mountain, Province, River, Sea, Spa

    “Organization” categories: Cabinet, Company_Group, Ethnic_Group_Other, Family, Government, International_Organization, Military, Nationality, Nonprofit_Organization, Organization_Other, Political_Organization_Other, Political_Party, Show_Organization, Sports_Federation, Sports_League, Sports_Team

    Specification: 30 (competitively) small categories included in the abstract category “Location” or “Organization.” We provide packed data for each abstract category.

    Definition of Categories

    Each category and each attribute is compliant to ENE Definition 8.0.0.

    Data Information

    Data detail is here.

    You can download entire data from here.

    Data Registration

    Data registration from here (expired).

    Changes from SHINRA 2018-JP Task

    We held the 2018-Jp Task as an attribute-value extraction task. However, if the target article has some written that same as attribute-value, it is difficult to estimate which written is the collect value at the extraction task.

    Therefore, we changed the attribute-value extraction task to the attribute-value annotation task for the 2019-JP task.

    Annotation Image Sample

    Adding JP-30 categories

    We added JP-30 for new targets explained above.







    Committee member

    • Committee chair: Satoshi Sekine
    • Committee (sorry for japanese):

    相澤彰子(NII)浅原正幸(国研)荒牧英治(奈良先端大)安藤まや(LC)市瀬龍太郎(NII)宇佐美佑(合同会社宇佐美)荻野孝野(JSA)加藤恒昭(東大)菊井玄一郎(岡山県立大)黒橋禎夫(京大)古宮嘉那子(茨城大)榊剛史(ホットリンク)貞光九月(フューチャーアーキテクト)佐藤敏紀(LINE)進藤裕之(奈良先端大)新納浩幸(茨城大)鈴木久美(MS)須藤克仁(奈良先端大)高村大也(AIRC)徳永健伸(東工大)中野幹生(HRI)西田豊明(京大)林良彦(早稲田大)東中竜一郎(NTT)福本文代(山梨大)松井邦夫(金沢工大)宮尾祐介(NII)村上浩司(楽天)山田育矢(Studio Ousia)横野光(富士通研)