Authors: Zinaida Apanovich, Alexander Marchuk
Keywords: Linked Open Data, SPARQL, ontology alignment, identity resolution, self-citation network
Abstract
This paper describes approaches to the vocabulary normalization and identity
resolution problems arising during the use of the LOD datasets to populate
the content of scholarly knowledge bases. We have proposed new heuristics,
using additional information extracted from full text sources of data. The
first heuristics uses the full record track of a person and the second one
uses self-citation networks. The dataset of the Open Archive of the Russian
Academy of Sciences and several bibliographic datasets are used as test
examples.
Zinaida Apanovich
1) A.P. Ershov Institute of Informatics Systems, Siberian Branch of the Russian
Academy of Sciences
6, Acad. Lavrentjev pr., Novosibirsk 630090, Russia
Phone: +7 383 3308652
E-mail:
2) Novosibirsk State University
630090, Novosibirsk-90, 2 Pirogova Str.
Alexander Marchuk
1) A.P. Ershov Institute of Informatics Systems, Siberian Branch of the Russian
Academy of Sciences
6, Acad. Lavrentjev pr., Novosibirsk 630090, Russia
Phone: +7 383 3308652
E-mail:
2) Novosibirsk State University
630090, Novosibirsk-90, 2 Pirogova Str
Fulltext

–
0.55 Mb