Designing a generic web forms crawler to enable legal compliance analysis of authentication sections


Author / Producer

Date

2022-01-17

Publication Type

Master Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

While users deserve security and privacy when using web services, these properties are at odds with the financial interests of website owners both in terms of work required to keep websites secure and revenues generated by exploiting sensitive data resulting in a violation of the user’s privacy. Countries, therefore, introduced regulations to balance the inequity. Namely, European Union’s General Data Protection Regulation (GDPR) specifies that any data collection and processing can only be done with the informed and specific consent of the user, including sharing of the said data with 3rd parties. Automated and large-scale detection of violations and security flaws is difficult because of the non-standardized behavior of website authentication mechanisms. We developed a web crawler for detecting and submitting mainly registration web forms. This crawler enables novel privacy and security research on a larger scale than was previously possible. The completely automated crawler can navigate the site to find the required form, fill the form, avoid bot detection mechanisms, submit the form, and validate the submission success. In 17 days, we crawled over 600,000 domains intending to create new user accounts. Our automated crawler detected a sign-up form on 22% of all the reachable websites with a 6.4% registration success rate. We have also received at least one email from 2.3% of all crawled pages. This significantly surpasses the prior version of this project and the best widely-used published tool.

Publication status

published

External links

Editor

Contributors

Examiner: Basin, David
Examiner: Kubicek, Karel

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

03634 - Basin, David / Basin, David check_circle

Notes

Funding

Related publications and datasets