Natural language access point to digital metal-organic polyhedra chemistry in The World Avatar


Loading...

Date

2025

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Metal-organic polyhedra (MOPs) are discrete, porous metal-organic assemblies known for their wide-ranging applications in separation, drug delivery, and catalysis. As part of The World Avatar (TWA) project-a universal and interoperable knowledge model-we have previously systematized known MOPs and expanded the explorable MOP space with novel targets. Although these data are available via a complex query language, a more user-friendly interface is desirable to enhance accessibility. To address a similar challenge in other chemistry domains, the natural language question-answering system "Marie" has been developed; however, its scalability is limited due to its reliance on supervised fine-tuning, which hinders its adaptability to new knowledge domains. In this article, we introduce an enhanced database of MOPs and a first-of-its-kind question-answering system tailored for MOP chemistry. By augmenting TWA's MOP database with geometry data, we enable the visualization of not just empirically verified MOP structures but also machine-predicted ones. In addition, we renovated Marie's semantic parser to adopt in-context few-shot learning, allowing seamless interaction with TWA's extensive MOP repository. These advancements significantly improve the accessibility and versatility of TWA, marking an important step toward accelerating and automating the development of reticular materials with the aid of digital assistants.

Publication status

published

Editor

Book title

Volume

6

Pages / Article No.

Publisher

Cambridge University Press

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

dynamic knowledge graphs; metal - organic polyhedra; question-answering systems; retrieval-augmented generation

Organisational unit

Notes

Funding

Related publications and datasets