Search
Results
-
SUPClust: Active Learning at the Boundaries
(2024)5th Workshop on practical ML for limited/low resource settingsActive learning is a machine learning paradigm designed to optimize model performance in a setting where labeled data is expensive to acquire. In this work, we propose a novel active learning method called SUPClust that seeks to identify points at the decision boundary between classes. By targeting these points, SUPClust aims to gather information that is most informative for refining the model's prediction of complex decision regions. ...Conference Paper -
Bridging Diversity and Uncertainty in Active learning with Self-Supervised Pre-Training
(2024)5th Workshop on practical ML for limited/low resource settingsThis study addresses the integration of diversity-based and uncertainty-based sampling strategies in active learning, particularly within the context of self-supervised pre-trained models. We introduce a straightforward heuristic called TCM that mitigates the cold start problem while maintaining strong performance across various data levels. By initially applying TypiClust for diversity sampling and subsequently transitioning to uncertainty ...Conference Paper -
A Fair and Resilient Decentralized Clock Network for Transaction Ordering
(2024)Leibniz International Proceedings in Informatics (LIPIcs) ~ 27th International Conference on Principles of Distributed Systems (OPODIS 2023)Traditional blockchain design gives miners or validators full control over transaction ordering, i.e., they can freely choose which transactions to include or exclude, as well as in which order. While not an issue initially, the emergence of decentralized finance has introduced new transaction order dependencies allowing parties in control of the ordering to make a profit by front-running others’ transactions. In this work, we present the ...Conference Paper -
Fault-Tolerant Distributed Directories
(2024)Leibniz International Proceedings in Informatics (LIPIcs) ~ 3rd Symposium on Algorithmic Foundations of Dynamic Networks (SAND 2024)Many fundamental distributed computing problems require coordinated access to a shared resource. A distributed directory is an overlay data structure on an asynchronous graph G that helps to access a shared token t. The directory supports three basic operations: publish, to initialize the directory, lookup, to read the contents of the token, and move, to get exclusive update access to the token. There are known directory schemes that ...Conference Paper -
DeFi Lending During The Merge
(2023)Leibniz International Proceedings in Informatics (LIPIcs) ~ 5th Conference on Advances in Financial Technologies (AFT 2023)Lending protocols in decentralized finance enable the permissionless exchange of capital from lenders to borrowers without relying on a trusted third party for clearing or market-making. Interest rates are set by the supply and demand of capital according to a pre-defined function. In the lead-up to The Merge: Ethereum blockchain's transition from proof-of-work (PoW) to proof-of-stake (PoS), a fraction of the Ethereum ecosystem announced ...Conference Paper -
Self-Supervised Contrastive Learning with Adversarial Perturbations for Defending Word Substitution-based Attacks
(2022)Findings of the Association for Computational Linguistics: NAACL 2022In this paper, we present an approach to improve the robustness of BERT language models against word substitution-based adversarial attacks by leveraging adversarial perturbations for self-supervised contrastive learning. We create a word-level adversarial attack generating hard positives on-the-fly as adversarial examples during contrastive learning. In contrast to previous works, our method improves model robustness without using any ...Conference Paper -
On Isotropy Calibration of Transformers
(2022)Proceedings of the Third Workshop on Insights from Negative Results in NLPDifferent studies of the embedding space of transformer models suggest that the distribution of contextual representations is highly anisotropic - the embeddings are distributed in a narrow cone. Meanwhile, static word representations (e.g., Word2Vec or GloVe) have been shown to benefit from isotropic spaces. Therefore, previous work has developed methods to calibrate the embedding space of transformers in order to ensure isotropy. However, ...Conference Paper -
TempCaps: A Capsule Network-based Embedding Model for Temporal Knowledge Graph Completion
(2022)Proceedings of the Sixth Workshop on Structured Prediction for NLPTemporal knowledge graphs store the dynamics of entities and relations during a time period. However, typical temporal knowledge graphs often suffer from incomplete dynamics with missing facts in real-world scenarios. Hence, modeling temporal knowledge graphs to complete the missing facts is important. In this paper, we tackle the temporal knowledge graph completion task by proposing TempCaps, which is a Capsule network-based embedding ...Conference Paper -
Word2Course: Creating Interactive Courses from as Little as a Keyword
(2022)CSEDU: Proceedings of the 14th International Conference on Computer Supported EducationIn this work, we introduce a novel pipeline that enables the generation of multiple-choice questions and exercises from as little as a topic keyword. Hence, providing users the possibility to start with a study objective in mind and then automatically generate personalized learning material. The main contributions of this project are a scraper that can extract relevant information from websites, a novel distractor generation method that ...Conference Paper -
Beyond prompting: Making Pre-trained Language Models Better Zero-shot Learners by Clustering Representations
(2022)Proceedings of the 2022 Conference on Empirical Methods in Natural Language ProcessingRecent work has demonstrated that pre-trained language models (PLMs) are zero-shot learners. However, most existing zero-shot methods involve heavy human engineering or complicated self-training pipelines, hindering their application to new situations. In this work, we show that zero-shot text classification can be improved simply by clustering texts in the embedding spaces of PLMs. Specifically, we fit the unlabeled texts with a Bayesian ...Conference Paper