Project SHINRA

“Project SHINRA” aims to build structured Knowledge Base combining Wikipedia and Extended Named Entity by “Resource by Collaborative Contribution” scheme.

Structuring Wikipedia and RbCC

Wikipedia, which is created by crowds, has so many entries and up-to-date information and is a great Knowledge Base. However, most of the information is written for people to read, but not for machine to manipulate. SHINRA project is aiming at structuring the information in Wikipedia for machine to manipulate.

We are hosting shared-tasks to build a structured KB by categorizing Wikipedia entities based on the Extended Named Entity definitions, which includes 200+ categories, and extract attributes defined for each category. The outputs by the participated systems will be unified, and the final results will be distributed as a structured KB. We call this scheme as “Resource by Collaborative Contribution (RbCC)”, and asking many collaborators to participate the tasks.

Shared-tasks

The task is to categorize Wikipedia entities in 30-languages. The training data is provided by hand-categorized Japanese Wikipedia entities and language-links to each language Wikipedia. For example, 316K entities in German Wikipedia has a link from 920K hand-categorized Japanese Wikipedia. The participants are supposed to categorize the remaining 1.7M German entities. This task will be run as one of the NTCIR-15 shared tasks.

The 30 languages are English, Spanish, French, German, Chinese, Russian, Portuguese, Italian, Arabic, Indonesian, Turkish, Dutch, Polish, Persian, Swedish, Vietnamese, Korean, Hebrew, Romanian, Norwegian, Czech, Ukrainian, Hindi, Finnish, Hungarian, Danish, Thai, Catalan, Greek, Bulgarian. (These Wikipedias have the largest number of users.)

The task is to structure Japanese Wikipedia entities. The categories includes those used at SHINRA2019, JP-5 and JP-30, as well as the new 47 categories under facilities and events, JP-47.

The task is to structure Japanese Wikipedia entities. The categories includes those used at SHINRA2018, JP-5, as well as the new 30 categories under location and organization, JP-30 (2 categories are not used due to the small number of entities). There were 11 participants to this task.

The task is to structure Japanese Wikipedia entities in 5 categories. The categories are person, city, company, airport and chemical compound. There were 8 participants to this task.

SHINRA Data Download

All SHINRA data can be downloaded.

SHINRA Document Download

All SHINRA documents (slides) can be downloaded

Extended Named Entity Definition

219 categories are defined for Name, Numerical value and Time. Examples include person, company, city, airport and chemical compound. Each category has its attribute definitions 10 to 30 attributes for each category.