CALL FOR TASK PARTICIPATION
Data release: January 2020
Registration & Result submission deadline: July 31, 2020
NTCIR-15 Conference: December 2020
SHINRA is a resource creation project started in the year 2017, aiming to structure the knowledge in Wikipedia. SHINRA2020-ML is the first shared-task of text classification in project SHINRA, tackling the challenge of classifying 30 language Wikipedia entities in fine-grained categories. The task is conducted as one of the NTCIR-15 tasks.
[Video] (approx.11 min):
Introduction of SHINRA2020-ML task
(categorization of 30-language Wikipedia into ENE)
The task is to classify 30 language (*1) Wikipedia pages into 219 categories using categorized Japanese Wikipedia pages and the interlanguage links to the corresponding pages in target languages. The categories are defined in Extended Named Entity (ENE) (ver.8.0), a four-layer ontology for classifying names, time, and numbers.
The participants are expected to select one or more target languages, and for each language, use the Wikipedia pages linked from the categorized Japanese pages as the training data, and run the system to classify the remaining pages which are not linked from the Japanese pages. Please see the TASK DESCRIPTION below for further details.
After the task is over, we (including the participants) will combine the results by all the participants (i.e. by Ensemble learning), and publish the results to the public. It is a scheme called “Resource by Collaborative Contribution (RbCC)”. We are expecting many participants with a good will.
(*1): The 30 languages are English, Spanish, French, German, Chinese, Russian, Portuguese, Italian, Arabic, Indonesian, Turkish, Dutch, Polish, Persian, Swedish, Vietnamese, Korean, Hebrew, Romanian, Norwegian, Czech, Ukrainian, Hindi, Finnish, Hungarian, Danish, Thai, Catalan, Greek, Bulgarian.
January 2020Data release
July 31, 2020Registration & Result submission deadline
August 20, 2020Evaluation results due back to participants
December 2020NTCIR-15 Conference (NII, Tokyo)
HOW TO PARTICIPATE
Another challenge of the task is multi-label classification. The target pages are expected to be classified into one or more of the categories in the four-layer ENE taxonomy (ver.8.0). Please see
here for details about Extended Named Entity (ENE).
For example, if the Wikipedia page titled ‘CBS‘ is to be classified into “Company” (ENE_id:184.108.40.206) and “Channel” (ENE_id:1.8.1), systems are expected to estimate all of the
Please notice that the pages should be classified into the lowest (bottommost) of the candidate categories of the ENE taxonomy (ver.8.0) hierarchy. In the previous example, estimating
“Juridical_Person” (ENE_id:1.4.6), “Organization” (ENE_id:1.4), or “Name” (ENE_id:1) instead of “Company” (ENE_id:220.127.116.11) is judged as incorrect.
- Disambiguation pages, redirects, and the pages which do not belong to Wikipedia main space should be classified into “IGNORED“（ENE_id:9）.
- If the format of a target Wikipedia page is invalid, for example, no namespace is specified, please just ignore the data.
(ex. ‘_id’:”AVQXnGmF62ewIKYZMTMQ” )
Participants are provided the training data and target data, which are available at SHINRA2020-ML: Data Download page (Minimum Datasets).
Notice that the target data is given as a Wikipedia dump, including the pages to be used as test data for evaluations.
Use of external information
If you use some external information for the task, be sure to indicate what resources you used in the system description report.
Submission of the result
Participants should submit the outputs for the entire target data.
As for the submission format, please check SHINRA2020-ML: Data Formats page (Submission Format). Notice that the ENE_ids in the submission format are evaluated regardless
of the corresponding scores.
We will evaluate the performance of systems on multi-label classification using the micro average F1 measure, i.e., the harmonic mean of micro-averaged precision and micro-averaged recall.
Satoshi Sekine (RIKEN AIP, Japan)
Masako Nomoto (RIKEN AIP, Japan)
Asuka Sumida (RIKEN AIP, Japan)
Kouta Nakayama (University of Tsukuba/ RIKEN AIP, Japan)
Koji Matsuda (Tohoku University/ RIKEN AIP, Japan)
Jiewen Wu (A*STAR, Singapore)
Christophe Gravier (Université de Lyon, France)
Hsin-Hsi Chen (National Taiwan University, Taiwan)
Haizhou Li (National University of Singapore, Singapore)
Virach Sornlertlamvanich (Thammasat Univercity,
Thailand / Musashino University, Japan)
Massimo Poesio (Mary Queen University of London, England)
Rafael Muñoz Guillena (Universitat d’Alacant, Spain)
Min Zhang (Soochow University, China)
Wenliang Chen (Soochow University, China)
Johan Bos (University of Groningen, Netherland)
Gerhard Weikum (DFKI, Germany)
Asif Ekbal (IIT Patna, India)
Gjergji Kasneci (Tübingen University, Germany)
Vasudeva Varma (IIIT Hyderabad, India)
Asanee Kasetsart (Kasetsart University, Thailand)
Pierpaolo Basile (Università degli Studi di Bari Aldo Moro, Italy)
David Nadeau (Innodata, Canada)
Murat Can Ganiz (Marmara University, Turkey)
Adrian Iftene (“Alexandru Ioan Cuza” University, Romania)
Tommi A Pirinen (Universität Hamburg, Germany)
Tru Cao (Ho Chi Minh City University of Technologies, Vietnam)
Petya Osenove (Sofia University “St. Kl. Ohridski”, Bulgaria)
Le Hong Phuong (Vietnam National University, Hanoi, Vietnam)
Nguyen Thi Minh Huyen (Vietnam National University, Hanoi Vietnam)
Nicolas Heist (Universität Mannheim, Germany)
Zdenek Zabokrtsky (Charles University, Czech Republic)
Tim Finin (University of Maryland, USA)
Su Jian (A*STAR, Singapore)
Manar Alkhatib (The British University in Dubai, United Arab Emirates)
Key-Sun Choi (Korea Advanced Institute of Science and Technology, Korea)
Nigel Collier (University of Cambridge, UK)
Ikuya Yamada (Studio Ousia, Japan)
Kentaro Inui (Tohoku University, Japan)
Tomoya Iwakura (Fujitsu, Japan)
Mehrnoush Shamsfard (Shahid Beheshti University, Iran)
Galia Angelova (Bulgarian Academy of Sciences, Bulgaria)
Yusuke Miyao (The University of Tokyo, Japan)
Kiril Simov (Bulgarian Academy of Sciences, Bulgaria)
Yukino Baba (University of Tsukuba, Japan)
Masaharu Yoshioka (Hokkaido University, Japan)
Heng Ji (University of Illinois at Urbana-Champaign, USA)
Miloslav Konopik (University of West Bohemia, Czech Republic)
Steven Skiena (Stony Brook University, USA)
Catherine Legg (Deakin University, Australia)
- Email to the organizers:
- Slack among the participants and organizers: