Show simple item record

dc.contributor.author
Pascual, Damian
dc.contributor.author
Luck, Sandro
dc.contributor.author
Wattenhofer, Roger
dc.contributor.editor
Demner-Fushman, Dina
dc.contributor.editor
Bretonnel Cohen, Kevin
dc.contributor.editor
Ananiadou, Sophia
dc.contributor.editor
Tsujii, Junichi
dc.date.accessioned
2021-07-30T12:46:03Z
dc.date.available
2021-07-26T13:10:44Z
dc.date.available
2021-07-30T12:46:03Z
dc.date.issued
2021
dc.identifier.isbn
978-1-954085-40-4
en_US
dc.identifier.other
10.18653/v1/2021.bionlp-1.6
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/497651
dc.identifier.doi
10.3929/ethz-b-000497651
dc.description.abstract
Automatic ICD coding is the task of assigning codes from the International Classification of Diseases (ICD) to medical notes. These codes describe the state of the patient and have multiple applications, e.g., computer-assisted diagnosis or epidemiological studies. ICD coding is a challenging task due to the complexity and length of medical notes. Unlike the general trend in language processing, no transformer model has been reported to reach high performance on this task. Here, we investigate in detail ICD coding using PubMedBERT, a state-of-the-art transformer model for biomedical language understanding. We find that the difficulty of fine-tuning the model on long pieces of text is the main limitation for BERT-based models on ICD coding. We run extensive experiments and show that despite the gap with current state-of-the-art, pretrained transformers can reach competitive performance using relatively small portions of text. We point at better methods to aggregate information from long texts as the main need for improving BERT-based ICD coding.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Association for Computational Linguistics
en_US
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
dc.title
Towards BERT-based Automatic ICD Coding: Limitations and Opportunities
en_US
dc.type
Conference Paper
dc.rights.license
Creative Commons Attribution 4.0 International
ethz.book.title
Proceedings of the 20th Workshop on Biomedical Language Processing
en_US
ethz.pages.start
54
en_US
ethz.pages.end
63
en_US
ethz.size
10 p.
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.event
20th Biomedical Natural Language Processing Workshop (BioNLP 2021)
en_US
ethz.event.location
Online
en_US
ethz.event.date
June 11, 2021
en_US
ethz.publication.place
Stroudsburg, PA
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03604 - Wattenhofer, Roger / Wattenhofer, Roger
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02640 - Inst. f. Technische Informatik und Komm. / Computer Eng. and Networks Lab.::03604 - Wattenhofer, Roger / Wattenhofer, Roger
en_US
ethz.date.deposited
2021-07-26T13:10:49Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2021-07-30T12:46:09Z
ethz.rosetta.lastUpdated
2022-03-29T10:50:25Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Towards%20BERT-based%20Automatic%20ICD%20Coding:%20Limitations%20and%20Opportunities&rft.date=2021&rft.spage=54&rft.epage=63&rft.au=Pascual,%20Damian&Luck,%20Sandro&Wattenhofer,%20Roger&rft.isbn=978-1-954085-40-4&rft.genre=proceeding&rft_id=info:doi/10.18653/v1/2021.bionlp-1.6&rft.btitle=Proceedings%20of%20the%2020th%20Workshop%20on%20Biomedical%20Language%20Processing
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record