Designing a generic web forms crawler to enable legal compliance analysis of authentication sections
OPEN ACCESS
Author / Producer
Date
2022-01-17
Publication Type
Master Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
While users deserve security and privacy when using web services, these properties are at odds with the financial interests of website owners both in terms of work required to keep websites secure and revenues generated by exploiting sensitive data resulting in a violation of the user’s privacy. Countries, therefore, introduced regulations to balance the inequity. Namely, European Union’s General Data Protection Regulation (GDPR) specifies that any data collection and processing can only be done with the informed and specific consent of the user, including sharing of the said data with 3rd parties. Automated and large-scale detection of violations and security flaws is difficult because of the non-standardized behavior of website authentication mechanisms.
We developed a web crawler for detecting and submitting mainly registration web forms. This crawler enables novel privacy and security research on a larger scale than was previously possible. The completely automated crawler can navigate the site to find the required form, fill the form, avoid bot detection mechanisms, submit the form, and validate the submission success. In 17 days, we crawled over 600,000 domains intending to create new user accounts. Our automated crawler detected a sign-up form on 22% of all the reachable websites with a 6.4% registration success rate. We have also received at least one email from 2.3% of all crawled pages. This significantly surpasses the prior version of this project and the best widely-used published tool.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner: Basin, David
Examiner: Kubicek, Karel
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Organisational unit
03634 - Basin, David / Basin, David