Automated Detection of GDPR Violations in Cookie Notices Using Machine Learning
dc.contributor.author
Bouhoula, Ahmed
dc.contributor.supervisor
Kubicek, Karel
dc.contributor.supervisor
Zac, Amit
dc.contributor.supervisor
Cotrini, Carlos
dc.contributor.supervisor
Basin, David
dc.date.accessioned
2022-10-13T09:03:50Z
dc.date.available
2022-10-13T08:39:04Z
dc.date.available
2022-10-13T09:03:50Z
dc.date.issued
2022-09
dc.identifier.uri
http://hdl.handle.net/20.500.11850/575741
dc.identifier.doi
10.3929/ethz-b-000575741
dc.description.abstract
Privacy regulations such as the General Data Protection Regulation require websites to inform EU-based users of the collection of their data and to request their consent to use non-essential cookies. This led to a global adaptation of cookie notices. Several studies showed that websites’ implementation of cookie notices tends to violate these regulations. However, most of these studies focused on a limited subset of websites, detected only simple violations using prescribed patterns, or restricted their analysis to only the first layer of cookie notices. This master’s thesis addresses these limitations. Our method automatically navigates through cookie notices using several heuristics, extracts their text, observes declared processing purposes and available consent options with Natural Language Processing, and analyzes websites’ cookies. We find that 47% of websites are highly susceptible of collecting users’ data despite negative consent, and that around 61% of cookie notices do not offer users the option to opt-out of consent.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich, Department of Computer Science
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.title
Automated Detection of GDPR Violations in Cookie Notices Using Machine Learning
en_US
dc.type
Master Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
dc.date.published
2022-10-13
ethz.size
49 p.
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02660 - Institut für Informationssicherheit / Institute of Information Security::03634 - Basin, David / Basin, David
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02150 - Dep. Informatik / Dep. of Computer Science::02660 - Institut für Informationssicherheit / Institute of Information Security::03634 - Basin, David / Basin, David
en_US
ethz.date.deposited
2022-10-13T08:39:04Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2022-10-13T09:03:51Z
ethz.rosetta.lastUpdated
2023-02-07T07:06:28Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Automated%20Detection%20of%20GDPR%20Violations%20in%20Cookie%20Notices%20Using%20Machine%20Learning&rft.date=2022-09&rft.au=Bouhoula,%20Ahmed&rft.genre=unknown&rft.btitle=Automated%20Detection%20of%20GDPR%20Violations%20in%20Cookie%20Notices%20Using%20Machine%20Learning
Files in this item
Publication type
-
Master Thesis [2133]