Show simple item record

dc.contributor.author
Wu, Wentao
dc.contributor.author
Zhang, Ce
dc.date.accessioned
2021-08-02T11:48:57Z
dc.date.available
2021-07-27T02:50:14Z
dc.date.available
2021-08-02T11:48:57Z
dc.date.issued
2021-06
dc.identifier.isbn
978-1-4503-8486-5
en_US
dc.identifier.other
10.1145/3462462.3468878
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/497737
dc.description.abstract
Recent advances in machine learning (ML) systems have made it incredibly easier to train ML models given a training set. However, our understanding of the behavior of the model training process has not been improving at the same pace. Consequently, a number of key questions remain: How can we systematically assign importance or value to training data with respect to the utility of the trained models, may it be accuracy, fairness, or robustness? How does noise in the training data, either injected by noisy data acquisition processes or adversarial parties, have an impact on the trained models? How can we find the right data that can be cleaned and labeled to improve the utility of the trained models? Just when we start to understand these important questions for ML models in isolation recently, we now have to face the reality that most real-world ML applications are way more complex than a single ML model. In this article - -an extended abstract for an invited talk at the DEEM workshop - -we will discuss our current efforts in revisiting these questions for an end-to-end ML pipeline, which consists of a noise model for data and a feature extraction pipeline, followed by the training of an ML model. In our opinion, this poses a unique challenge on the joint analysis of data processing and learning. Although we will describe some of our recent results towards understanding this interesting problem, this article is more of a "confession"on our technical struggles and a "cry for help"to our data management community.
en_US
dc.language.iso
en
en_US
dc.publisher
Association for Computing Machinery
en_US
dc.title
Towards understanding end-to-end learning in the context of data: Machine learning dancing over semirings & Codd's table
en_US
dc.type
Conference Paper
dc.date.published
2021-06-20
ethz.book.title
Proceedings of the 5th Workshop on Data Management for End-To-End Machine Learning (DEEM '21)
en_US
ethz.pages.start
1
en_US
ethz.size
4 p.
en_US
ethz.event
5th Workshop on Data Management for End-To-End Machine Learning (DEEM 2021)
en_US
ethz.event.location
Online
en_US
ethz.event.date
June 20-25, 2021
en_US
ethz.notes
Extended abstract.
en_US
ethz.identifier.scopus
ethz.publication.place
New York, NY
en_US
ethz.publication.status
published
en_US
ethz.date.deposited
2021-07-27T02:50:21Z
ethz.source
SCOPUS
ethz.eth
yes
en_US
ethz.availability
Metadata only
en_US
ethz.rosetta.installDate
2021-08-02T11:49:03Z
ethz.rosetta.lastUpdated
2022-03-29T10:52:20Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Towards%20understanding%20end-to-end%20learning%20in%20the%20context%20of%20data:%20Machine%20learning%20dancing%20over%20semirings%20&%20Codd's%20table&rft.date=2021-06&rft.spage=1&rft.au=Wu,%20Wentao&Zhang,%20Ce&rft.isbn=978-1-4503-8486-5&rft.genre=proceeding&rft_id=info:doi/10.1145/3462462.3468878&rft.btitle=Proceedings%20of%20the%205th%20Workshop%20on%20Data%20Management%20for%20End-To-End%20Machine%20Learning%20%20(DEEM%20'21)
 Search print copy at ETH Library

Files in this item

FilesSizeFormatOpen in viewer

There are no files associated with this item.

Publication type

Show simple item record