Show simple item record

dc.contributor.author
Akhadov, Sabir
dc.contributor.supervisor
Müller, Ingo
dc.contributor.supervisor
Alonso, Gustavo
dc.date.accessioned
2018-05-09T11:36:28Z
dc.date.available
2018-05-09T09:46:02Z
dc.date.available
2018-05-09T11:36:28Z
dc.date.issued
2017-10
dc.identifier.uri
http://hdl.handle.net/20.500.11850/263341
dc.identifier.doi
10.3929/ethz-b-000263341
dc.description.abstract
Data analytics has become the driving force for many industries and scientific research. More and more decisions are maid based on statistical analysis of large datasets and machine learning. Big data data processing frameworks, such as Apache Spark, provide an easy-to-use out-of-the-box solution, scalable to large machine clusters. Python is the most widespread programming language in the data science field due to its simplicity and the abundance of analytical tools developed for it. Many Spark users would prefer its Python frontend in their daily work. Multiple studies indicate, however, that there is a wide gap between Spark’s performance and the best handwritten code. With this thesis we bring the functional data-flow programs’ performance closer to the bare-metal speeds and show that it is possible to write productively high performance code.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.title
PySpark at Bare-Metal Speed
en_US
dc.type
Master Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2018-05-09
ethz.size
51 p.
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02663 - Institut für Computing Platforms / Institute for Computing Platforms::03506 - Alonso, Gustavo / Alonso, Gustavo
en_US
ethz.date.deposited
2018-05-09T09:46:03Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2018-05-09T11:36:32Z
ethz.rosetta.lastUpdated
2018-11-06T22:37:02Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=PySpark%20at%20Bare-Metal%20Speed&rft.date=2017-10&rft.au=Akhadov,%20Sabir&rft.genre=unknown&rft.btitle=PySpark%20at%20Bare-Metal%20Speed
 Search via SFX

Files in this item

Thumbnail

Publication type

Show simple item record