Xia Li
Loading...
Last Name
Li
First Name
Xia
ORCID
Organisational unit
03817 - Stampanoni, Marco F.M. / Stampanoni, Marco F.M.
11 results
Filters
Reset filtersSearch Results
Publications 1 - 10 of 11
- Diffusion Schrödinger bridge models for high-quality MR-to-CT synthesis for proton treatment planningItem type: Journal Article
Medical PhysicsLi, Muheng; Li, Xia; Safai, Sairos; et al. (2025)Background In recent advancements in proton therapy, magnetic resonance (MR)-based treatment planning is gaining momentum due to its excellent soft tissue contrast and high potential to minimize extra radiation exposure compared to traditional computed tomography (CT)-based methods. This transition underscores the critical need for accurate MR-to-CT image synthesis, which is essential for precise proton dose calculations. Purpose This study aims to introduce and evaluate the diffusion Schrödinger bridge models (DSBM), an innovative approach for high-quality and efficient MR-to-CT synthesis, in order to improve both the quality and speed of synthetic CT (sCT) image generation. Methods The DSBM learns the nonlinear diffusion processes between MR and CT data distributions. Unlike traditional diffusion models (DMs), which start synthesis from a Gaussian distribution, DSBM starts from the prior distribution, enabling more direct and efficient synthesis. The model was trained on 46 head-and-neck (HN) MR-CT pairs and 77 brain tumor MR-CT pairs, with 8 and 10 scans used for testing, respectively. Comprehensive evaluations were conducted at both image and dosimetric levels, using metrics such as mean absolute error (MAE), Dice score, voxel-wise proton dose differences, gamma pass rates of clinical plans, and typical dose indices. Results For the HN dataset, DSBM achieved a lower MAE of 72.42 ± 9.78 Hounsfield unit (HU) compared to 77.72 ± 9.11 HU with the best baseline approach, and a higher Dice score for bone of 83.32 ± 3.25% compared to 82.55 ± 3.62%, indicating superior anatomical accuracy. Dosimetric evaluations showed a 1%/1 mm gamma pass rate of 95.85 ± 2.99%, surpassing the 95.25 ± 3.09% achieved by the baseline model. For the brain tumor dataset, DSBM outperformed the baseline with an MAE of 91.73 ± 6.86 HU compared to 103.25 ± 9.58 HU, and a Dice score for bone of 82.85 ± 3.88% compared to 81.27 ± 4.59%. DSBM also demonstrated a higher 1%/1 mm gamma pass rate of 97.93 ± 1.82%, confirming its robustness across different anatomical regions. Notably, DSBM achieved these results with very few number of neural function evaluation steps, significantly improving computational efficiency compared to standard DMs. Conclusions The DSBM demonstrates superior performance over traditional image synthesis methods in MR-based proton treatment planning. Its ability to generate high-quality sCT images with enhanced speed and accuracy highlights its potential as a valuable and efficient tool in various radiotherapy clinical scenarios. - Neural Clustering based Visual Representation LearningItem type: Conference Paper
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Chen, Guikun; Li, Xia; Yang, Yi; et al. (2024)We investigate a fundamental aspect of machine vision: the measurement of features, by revisiting clustering, one of the most classic approaches in machine learning and data analysis. Existing visual feature extractors, including ConvNets, ViTs, and MLPs, represent an image as rectangular regions. Though prevalent, such a grid-style paradigm is built upon engineering practice and lacks explicit modeling of data distribution. In this work, we propose feature extraction with clustering (FEC), a conceptually elegant yet surprisingly ad-hoc interpretable neural clustering framework, which views feature extraction as a process of selecting representatives from data and thus automatically captures the underlying data distribution. Given an image, FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives. Such an iterative working mechanism is implemented in the form of several neural layers and the final representatives can be used for downstream tasks. The cluster assignments across layers, which can be viewed and inspected by humans, make the forward process of FEC fully transparent and empower it with promising ad-hoc interpretability. Extensive experiments on various visual recognition models and tasks verify the effectiveness, generality, and interpretability of FEC. We expect this work will provoke a rethink of the current de facto grid-style paradigm. - MiLNet: Multiplex Interactive Learning Network for RGB-T Semantic SegmentationItem type: Journal Article
IEEE Transactions on Image ProcessingLiu, Jinfu; Liu, Hong; Li, Xia; et al. (2025)Semantic segmentation methods enhance robust and reliable understanding under adverse illumination conditions by integrating complementary information from visible and thermal infrared (RGB-T) images. Existing methods primarily focus on designing various feature fusion modules between different modalities, overlooking that feature learning is the critical aspect of scene understanding. In this paper, we propose a novel module-free Multiplex Interactive Learning Network (MiLNet) for RGB-T semantic segmentation, which adeptly integrates multi-model, multi-modal, and multi-level feature learning, fully exploiting the potential of multiplex feature interaction. Specifically, robust knowledge is transferred from the vision foundation model to our task-specific model to enhance its segmentation performance. In the task-specific model, an asymmetric simulated learning strategy is introduced to facilitate mutual learning of geometric and semantic information between high- and low-level features across modalities. Additionally, an inverse hierarchical fusion strategy based on feature learning pairs is adopted and further refined using multilabel and multiscale supervision. Experimental results on the MFNet and PST900 datasets demonstrate that MiLNet outperforms state-of-the-art methods in terms of mIoU. As a limitation, the model's performance under few-sample conditions could be improved further. The code and results of our method are available at https://github.com/Jinfu-pku/MiLNet. - A proof-of-concept study of direct magnetic resonance imaging-based proton dose calculation for brain tumors via neural networks with Monte Carlo-comparable accuracyItem type: Journal Article
Physics and Imaging in Radiation OncologyLi, Muheng; Winterhalter, Carla; Li, Xia; et al. (2025)Background and purpose: Proton therapy currently relies on computed tomography (CT) imaging despite magnetic resonance imaging's (MRI) superior soft-tissue contrast. While synthetic CTs can be generated from magnetic resonance (MR) images, this introduces additional complexity. We present a deep learning-based dose engine enabling direct proton dose calculation from MR images to streamline workflows while maintaining Monte Carlo (MC)-level accuracy. Materials and methods: Using paired MR-CT scans from 39 brain tumor patients (29/3/7 for training/validation/testing), we developed a deep learning framework using various sequence models for individual proton pencil beam dose prediction. The framework processes beam-eye-view patches from 2000 random beam configurations per patient, varying in angles and energy, with corresponding MC dose distributions pre-calculated on CT. Models using CT images were trained for comparison. Results: The xLSTM architecture performed best for both MR and CT-based scenarios among the evaluated sequence models. For full treatment plans, our model achieved gamma pass rates with median 99.8 % (range: 98.6 %–99.9 %, 1 mm/1%), and median percentage dose errors of 0.2 % (range: 0.1 %–0.4 %) within patient bodies and 1.3 % (range: 0.8 %–3.7 %) in high-dose regions (>90 % prescription dose). The model required only 3 ms per beam prediction compared to 2 s for MC simulation. Conclusion: This study demonstrated the feasibility of MC-quality proton dose calculations directly from MR images for brain tumor patients, achieving comparable accuracy with faster computation and simplified implementation. - Bridging Magnetic Resonance Imaging and Computed Tomography for Proton TherapyItem type: Doctoral ThesisLi, Xia (2025)Proton therapy as an advanced radiotherapy technique, enables oncologists to treat cancer with high precision. Leveraging the unique Bragg-peak property, proton therapy delivers the majority of its energy directly to the target area, minimizing radiation exposure of surrounding healthy tissues. However, this precision relies on accurate anatomical details, as proton therapy’s steep dose gradients around tumor boundaries require exact targeting. Given that proton therapy occurs over multiple fractions, patient anatomy and respiratory motion can vary significantly between sessions. To address these challenges, daily adaptive proton therapy and image-guided proton therapy have been developed to incorporate daily or on-board patient imaging into dose calculation and treatment planning. These adaptive approaches depend on advanced imaging modalities, primarily magnetic resonance imaging (MRI) and computed tomography (CT). MRI and CT sense complementary contrasts and tissue characteristics, enhancing the accuracy of treatment planning when used together. However, integrating information from both modalities remains challenging due to each modality's limitations. This integration typically involves value mapping through MR-based CT synthesis (MR-to-CT) and spatial alignment via deformable image registration. Despite extensive ongoing research, achieving high accuracy, robustness, and efficiency in these processes remains an open challenge. This thesis introduces several innovative methods designed to improve these critical aspects. As the first contribution, an uncertainty-conditioned MR-to-CT synthesis method generates reliable synthetic CT images by accounting for uncertainties in the MR imaging process. Estimating these uncertainties is crucial for improving dose calculation accuracy and treatment outcomes. The second contribution involves joint MR-CT registration to improve synthesis quality and anatomical fidelity. Accurate alignment between MR-CT image pairs ensures the effectiveness of image-to-image synthesis, reducing errors in dose calculation and treatment delivery. The third contribution introduces continuous spatial and temporal representation models for deformable motion modeling and 4D image frame interpolation. These models effectively handle complex anatomical regions with large deformations and sliding boundaries, ensuring more accurate deformation fields. The fourth contribution focuses on Neural Graphics Primitives for real-time deformable registration, balancing computational efficiency and accuracy. This approach supports rapid adaptation to patient movements during treatment, making it suitable for intra-fraction motion tracking and compensation in the future. The final contribution introduces the Posterior Agreement for hyperparameter optimization in deformable image registration. PA provides a systematic framework for selecting robust hyperparameter settings, ensuring consistent performance across varied clinical scenarios. Collectively, solutions of the proposed methods significantly improve high-precision radiotherapy in quality, efficiency, and adaptability, enhancing high-precision radiotherapy, thereby contributing to the broader vision of precision treatment in proton therapy.
- Exploring the effect of training set size and number of categories on ice crystal classification through a contrastive semi-supervised learning algorithmItem type: Journal Article
Atmospheric Measurement TechniquesChu, Yunpei; Zhang, Huiying; Li, Xia; et al. (2025)The shapes of ice crystals play an important role in global precipitation formation and the radiation budget. Classifying ice crystal shapes can improve our understanding of in-cloud conditions and these processes. Existing classification methods rely on features such as the aspect ratio of ice crystals, environmental temperature, and so on, which bring high instability to the classification performance, or employ supervised machine learning algorithms that heavily rely on human labeling. This poses significant challenges, including human subjectivity in classification and a substantial labor cost in manual labeling. In addition, previous deep learning algorithms for ice crystal classification are often trained and evaluated on datasets with varying sizes and classification schemes, each with distinct criteria and a different number of categories, making it difficult to make a fair comparison of algorithm performance. To overcome these limitations, a contrastive semi-supervised learning (CSSL) algorithm for the classification of ice crystals is proposed. The algorithm consists of an upstream unsupervised learning network tasked with extracting meaningful representations from a large number of unlabeled ice crystal images, and a downstream supervised network is fine-tuned with a small subset of labeled images of the entire dataset to perform the classification task. To determine the minimum number of ice crystal images that require human labeling while maintaining the algorithm performance, the algorithm is trained and evaluated on datasets with varying sizes and numbers of categories. The ice crystal data used in this study were collected during the NASCENT campaign at Ny-& Aring;lesund and CLOUDLAB project on the Swiss Plateau using a holographic imager mounted on a tethered balloon system. In general, the CSSL algorithm outperforms a purely supervised algorithm in classifying 19 categories. Approximately 154 h of manual labeling can be avoided using just 11 % (2048 images) of the training set for fine tuning, sacrificing only 3.8 % in overall precision compared to a fully supervised model trained on the entire dataset. In the four-category classification task, the CSSL algorithm also outperforms the purely supervised algorithm. When fine-tuned on just 2048 images (25 % of the dataset), it achieves an overall accuracy of 89.6 %, nearly matching the 91.0 % accuracy of the supervised algorithm trained on 8192 images. When tested on the unseen CLOUDLAB dataset, CSSL shows superior generalization, improving accuracy by an average of 2.19 %. Our analysis also reveals that both CSSL and purely supervised algorithms exhibit inherent instability when trained on small dataset sizes, and the performance difference between them converges as the training set size exceeds 2048 samples. These results highlight the strength and practical effectiveness of CSSL in comparison to purely supervised methods and the potential of the CSSL algorithm to perform well on datasets collected under different conditions. - HYRE: Hybrid Regressor for 3D Human Pose and Shape EstimationItem type: Journal Article
IEEE Transactions on Image ProcessingLi, Wenhao; Liu, Mengyuan; Liu, Hong; et al. (2025)Regression-based 3D human pose and shape estimation often fall into one of two different paradigms. Parametric approaches, which regress the parameters of a human body model, tend to produce physically plausible but image-mesh misalignment results. In contrast, non-parametric approaches directly regress human mesh vertices, resulting in pixel-aligned but unreasonable predictions. In this paper, we consider these two paradigms together for a better overall estimation. To this end, we propose a novel HYbrid REgressor (HYRE) that greatly benefits from the joint learning of both paradigms. The core of our HYRE is a hybrid intermediary across paradigms that provides complementary clues to each paradigm at the shared feature level and fuses their results at the part-based decision level, thereby bridging the gap between the two. We demonstrate the effectiveness of the proposed method through both quantitative and qualitative experimental analyses, resulting in improvements for each approach and ultimately leading to better hybrid results. Our experiments show that HYRE outperforms previous methods on challenging 3D human pose and shape benchmarks. - Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context LearningItem type: Conference Paper
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Wang, Xinshun; Fang, Zhongbin; Li, Xia; et al. (2024)In-context learning provides a new perspective for multi-task modeling for vision and NLP. Under this setting, the model can perceive tasks from prompts and accomplish them without any extra task-specific head predictions or model fine-tuning. However, skeleton sequence modeling via in-context learning remains unexplored. Directly applying existing in-context models from other areas onto skeleton sequences fails due to the similarity between inter-frame and cross-task poses, which makes it exceptionally hard to perceive the task correctly from a subtle context. To address this challenge, we propose Skeleton-in-Context (SiC), an effective framework for in-context skeleton sequence modeling. Our SiC is able to handle multiple skeleton-based tasks simultaneously after a single training process and accomplish each task from context according to the given prompt. It can further generalize to new, unseen tasks according to customized prompts. To facilitate context perception, we additionally propose a task-unified prompt, which adaptively learns tasks of different natures, such as partial joint-level generation, sequence-level prediction, or 2D-to-3D motion prediction. We conduct extensive experiments to evaluate the effectiveness of our SiC on multiple tasks, including motion prediction, pose estimation, joint completion, and future pose estimation. We also evaluate its generalization capability on unseen tasks such as motion-in-between. These experiments show that our model achieves state-of-the-art multi-task performance and even outperforms single-task methods on certain tasks. - Enhancing Brain MRI Super-Resolution Through Multi-Slice Aware Matching and FusionItem type: Journal Article
CAAI Transactions on Intelligence TechnologyXiang, Jie; Zhao, Ang; Li, Xia; et al. (2025)In clinical diagnosis, magnetic resonance imaging (MRI) allows different contrast images to be obtained. High-resolution (HR) MRI presents fine anatomical structures, which is important for improving the efficiency of expert diagnosis and realising smart healthcare. However, due to the cost of scanning equipment and the time required for scanning, obtaining an HR brain MRI is quite challenging. Therefore, to improve the quality of images, reference-based super-resolution technology has come into existence. Nevertheless, the existing methods still have some drawbacks: (1) The advantages of different contrast images are not fully utilised. (2) The slice-by-slice scanning nature of magnetic resonance imaging is not considered. (3) The ability to capture contextual information and to match and fuse multi-scale, multi-contrast features is lacking. In this paper, we propose the multi-slice aware matching and fusion (MSAMF) network, which makes full use of multi-slice reference images information by introducing a multi-slice aware module and multi-scale matching strategy to capture corresponding contextual information in reference features at other scales. To further integrate matching features, a multi-scale fusion mechanism is also designed to progressively fuse multi-scale matching features, thereby generating more detailed super-resolution images. The experimental results support the benefits of our network in enhancing the quality of brain MRI reconstruction. - Gaussian primitives for deformable image registrationItem type: Journal Article
Physics and Imaging in Radiation OncologyLi, Jihe; Liu, Xiang; Zhang, Fabian; et al. (2025)Background and Purpose: Deformable image registration (DIR) plays a critical role in radiotherapy by compensating for anatomical deformations. However, existing iterative and data-driven methods are often hindered by computational inefficiency or limited generalization. In response, our objective was to develop a novel optimization-based DIR method that reduces computational overhead and preserves the robust generalization of iterative methods while enhancing interpretability. Materials and Methods: We proposed GaussianDIR, a novel DIR framework that explicitly represents the deformation field using a sparse set of adaptive Gaussian primitives. Each primitive is characterized by its centre, covariance, and associated local rigid deformation. Voxel-wise displacements are derived via blending the local rigid deformations of neighbouring primitives, enabling flexible yet efficient motion modelling. Results: On DIRLab lung dataset, GaussianDIR achieved a target registration error (TRE) of 1.00±1.11 millimeters in about 2.5 s, offering an effective trade-off between speed and precision for high-resolution images. On OASIS brain and ACDC cardiac datasets, the Dice similarity coefficient (DSC) improved from 80.6% to 81.3% and from 81.0% to 81.2% over previous state-of-the-art methods, respectively. Moreover, we compared the generalization performance of GaussianDIR and a data-driven method on IXI dataset, and found that GaussianDIR outperformed the data-driven method by 6.3% in DSC. Conclusion: GaussianDIR combines high registration accuracy with computational efficiency, interpretability, and strong generalization performance. It challenged the conventional notion that iterative methods were inherently slow and overcomed the generalization limitations of data-driven methods, with potential for real-time clinical applications in radiotherapy.
Publications 1 - 10 of 11