Lindicle

As the Web becomes ever more enmeshed with our daily lives, there is a growing desire for direct access to knowledge distributed on the Web. Linked Data enables the extension of the Web with a global data/knowledge space based on open standards - the Web of Data. The DBpedia project provides a semantic representation of Wikipedia in which multiple language labels are attached to the individual concepts, and has become the nucleus for the Web of Data.

But the coverage of lingual interlinks is still low because of two reasons. One reason is that the links still constitute less than 5% of the total number of triples available on the Linked Data, and another reason is the highly unbalance of articles in different languages in linked data. Take DBpedia as an example, there are only 382 thousand Chinese articles in Wikipedia, which is about 33% of the number of French articles and less than 10% of the number of English articles.

Cross-lingual interlinking consists in discovering links between objects across knowledge bases of different languages. It not only can enhance the linked data internationalization and the globalization the knowledge sharing of different languages on the Web, but also can facilitate the cross-lingual language processing such as cross-lingual information retrieval and machine translation.

The goal of this project is to develop technology to interlink data and match ontologies in cross-lingual environment by exploiting large-scale heterogeneous wiki knowledge bases in different languages (such as dbpedia.org, dbpedia.fr or Hudong Baike). The challenges we are facing are as follows: 1) what are the key factors which can be used in detecting the links among the cross-lingual resources? 2) How to bridge the gap between two different languages and accurately find the cross-lingual links? 3) How to use the existing inter- language links in Wikipedia to enhance the linking across heterogeneous sources of different languages?

The project will build the gateway to build international LOD of different languages. By combining cross-lingual ontology matching, knowledge extraction and machine learning, the contributions of the project will be:

To discover cross-lingual links across multiple heterogeneous wiki resources, and build cross-lingual knowledge bases.
To develop effective cross-lingual data linking and ontology matching algorithms by making use of the cross-lingual knowledge bases.

The solution will be evaluated on cross-lingual linked data of news, movie and wiki knowledge. The technology developed in the project will be used to build cross-lingual linked data among news, movies and wiki knowledge base, and provides services in the area of multifaceted cross-lingual semantic search and cross-lingual similar document finding.

Joint international blanc project between ANR [1] and NSFC [2]


ANR-12-IS02-002-01	NSFC-61261130588