Show simple item record

dc.contributor.author
Schulz, Christian
dc.contributor.author
Mazloumian, Amin
dc.contributor.author
Petersen, Alexander M.
dc.contributor.author
Penner, Orion
dc.contributor.author
Helbing, Dirk
dc.date.accessioned
2019-04-05T14:56:26Z
dc.date.available
2017-06-11T13:30:28Z
dc.date.available
2019-04-05T14:56:26Z
dc.date.issued
2014
dc.identifier.issn
2193-1127
dc.identifier.other
10.1140/epjds/s13688-014-0011-3
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/91888
dc.identifier.doi
10.3929/ethz-b-000091888
dc.description.abstract
We present a novel algorithm and validation method for disambiguating author names in very large bibliographic data sets and apply it to the full Web of Science (WoS) citation index. Our algorithm relies only upon the author and citation graphs available for the whole period covered by the WoS. A pair-wise publication similarity metric, which is based on common co-authors, self-citations, shared references and citations, is established to perform a two-step agglomerative clustering that first connects individual papers and then merges similar clusters. This parameterized model is optimized using an h-index based recall measure, favoring the correct assignment of well-cited publications, and a name-initials-based precision using WoS metadata and cross-referenced Google Scholar profiles. Despite the use of limited metadata, we reach a recall of 87% and a precision of 88% with a preference for researchers with high h-index values. 47 million articles of WoS can be disambiguated on a single machine in less than a day. We develop an h-index distribution model, confirming that the prediction is in excellent agreement with the empirical data, and yielding insight into the utility of the h-index in real academic ranking scenarios.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
SpringerOpen
dc.rights.uri
http://creativecommons.org/licenses/by/2.0/
dc.subject
Name disambiguation
en_US
dc.subject
Citation analysis
en_US
dc.subject
Clustering
en_US
dc.subject
h-index
en_US
dc.subject
Science of science
en_US
dc.title
Exploiting citation networks for large-scale author name disambiguation
en_US
dc.type
Journal Article
dc.rights.license
Creative Commons Attribution 2.0 Generic
ethz.journal.title
EPJ Data Science
ethz.journal.volume
3
en_US
ethz.journal.abbreviated
EPJ Data Sci.
ethz.pages.start
11
en_US
ethz.size
14 p.
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.publication.place
Heidelberg
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02045 - Dep. Geistes-, Sozial- u. Staatswiss. / Dep. of Humanities, Social and Pol.Sc.::03784 - Helbing, Dirk / Helbing, Dirk
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02045 - Dep. Geistes-, Sozial- u. Staatswiss. / Dep. of Humanities, Social and Pol.Sc.::03784 - Helbing, Dirk / Helbing, Dirk
ethz.date.deposited
2017-06-11T13:30:35Z
ethz.source
ECIT
ethz.identifier.importid
imp5936527d2e87536286
ethz.ecitpid
pub:144582
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2017-07-15T05:37:57Z
ethz.rosetta.lastUpdated
2024-02-02T07:35:16Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Exploiting%20citation%20networks%20for%20large-scale%20author%20name%20disambiguation&rft.jtitle=EPJ%20Data%20Science&rft.date=2014&rft.volume=3&rft.spage=11&rft.issn=2193-1127&rft.au=Schulz,%20Christian&Mazloumian,%20Amin&Petersen,%20Alexander%20M.&Penner,%20Orion&Helbing,%20Dirk&rft.genre=article&rft_id=info:doi/10.1140/epjds/s13688-014-0011-3&
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record