SHINRA2018-JP: Structuring Task

SHINRA 2018-JP Task” is a shared-task of structuring information in Japanese Wikipedia. The SHINRA project is a project to construct a resource by collaborative contribution (RbCC) held in 2018.

“SHINRA2018-JP task” is aiming to create structured information on Japanese Wikipedia. The structure framework is based on the name entity ontology “Extended Named Entity (ENE)”, which has around 200 categories. We have categorized the entire Japanese Wikipedia into ENE categories, and at SHINRA2018-JP, we picked up 5 categories and the task is to extract attribute-values of the attributes defined in each ENE category. The outputs of the participant systems are available for further research, such as the ensemble learning.

At the 2018-JP Task, We choose 5 ENE categories as a structuring target.

  • Provide a list page for each category includes.
  • Provide HTML files, Wikidump and Cirrus Dump.
  • Some annotation samples created manually as training data for a supervised learning approach (about 600 files per category).
We provide these datasets for participants to try structuring by automatic value extraction from HTML/Wikidump/CirrusDump.

    Target Category

    • Person: Biggest category which includes about 300 thousands of articles.
    • City: Big category which includes about 50 thousands of articles.
    • Compound: Big category which includes about 30 thousands of articles.
    • Chemical-Compound: Competitively small category which includes 6 thousands of articles.
    Committee Chair: Satoshi Sekine

    【委員】乾健太郎(東北大)岩倉友哉(富士通連携)奥村学(東工大)小原京子(慶応大)河原大輔(京大)木村泰知(小樽商科大)小林暁雄(AIP)小林隼人(Yahoo!)鈴木正敏(東北大)馬場雪乃(京大)松田耕史(東北大)吉岡真治(北大)大関洋平(早稲田大)<以上:AIP関係者>相澤彰子(NII)浅原正幸(国研)荒牧英治(奈良先端大)安藤まや(LC)市瀬龍太郎(NII)宇佐美佑(合同会社宇佐美)荻野孝野(JSA)加藤恒昭(東大)菊井玄一郎(岡山県立大)黒橋禎夫(京大)古宮嘉那子(茨城大)榊剛史(ホットリンク)貞光九月(フューチャーアーキテクト)佐藤敏紀(LINE)進藤裕之(奈良先端大)新納浩幸(茨城大)鈴木久美(MS)須藤克仁(奈良先端大)高村大也(AIRC)徳永健伸(東工大)中野幹生(HRI)西田豊明(京大)林良彦(早稲田大)東中竜一郎(NTT)福本文代(山梨大)松井邦夫(金沢工大)宮尾祐介(NII)村上浩司(楽天)山田育矢(Studio Ousia)横野光(富士通研)