Platonic Transformers: A Solid Choice For Equivariance
Islam, Mohammad Mohaiminul, Anand, Rishabh, Wessels, David R., Kruiff, Friso, Kuipers, Thijs P., Ying, Rex, Sánchez, Clara I., Vadgama, Sharvaree, Bökman, Georg, and Bekkers, Erik J.
While widespread, Transformers lack inductive biases for geometric symmetries common in science and computer vision. Existing equivariant methods often sacrifice the efficiency and flexibility that make Transformers so effective through complex, computationally intensive designs. We introduce the Platonic Transformer to resolve this trade-off. By defining attention relative to reference frames from the Platonic solid symmetry groups, our method induces a principled weight-sharing scheme. This enables combined equivariance to continuous translations and Platonic symmetries, while preserving the exact architecture and computational cost of a standard Transformer. Furthermore, we show that this attention is formally equivalent to a dynamic group convolution, which reveals that the model learns adaptive geometric filters and enables a highly scalable, linear-time convolutional variant. Across diverse benchmarks in computer vision (CIFAR-10), 3D point clouds (ScanObjectNN), and molecular property prediction (QM9, OMol25), the Platonic Transformer achieves competitive performance by leveraging these geometric constraints at no additional cost.
arXiv
Probing Equivariance and Symmetry Breaking in Convolutional Networks
Vadgama, Sharvaree, Islam, Mohammad Mohaiminul, Buracas, Domas, Shewmake, Christian, Moskalev, Artem, and Bekkers, Erik
In this work, we explore the trade-offs of explicit structural priors, particularly group equivariance. We address this through theoretical analysis and a comprehensive empirical study. To enable controlled and fair comparisons, we introduce \textttRapidash, a unified group convolutional architecture that allows for different variants of equivariant and non-equivariant models. Our results suggest that more constrained equivariant models outperform less constrained alternatives when aligned with the geometry of the task, and increasing representation capacity does not fully eliminate performance gaps. We see improved performance of models with equivariance and symmetry-breaking through tasks like segmentation, regression, and generation across diverse datasets. Explicit \textitsymmetry breaking via geometric reference frames consistently improves performance, while \textitbreaking equivariance through geometric input features can be helpful when aligned with task geometry. Our results provide task-specific performance trends that offer a more nuanced way for model selection.
arXiv
Longitudinal Flow Matching for Trajectory Modeling
Islam, Mohammad Mohaiminul, Kuipers, Thijs P., Vadgama, Sharvaree, Vente, Coen, Khan, Afsana, Sánchez, Clara I., and Bekkers, Erik J.
Generative models for sequential data often struggle with sparsely sampled and high-dimensional trajectories, typically reducing the learning of dynamics to pairwise transitions. We propose Interpolative Multi-Marginal Flow Matching (IMMFM), a framework that learns continuous stochastic dynamics jointly consistent with multiple observed time points. IMMFM employs a piecewise-quadratic interpolation path as a smooth target for flow matching and jointly optimizes drift and a data-driven diffusion coefficient, supported by a theoretical condition for stable learning. This design captures intrinsic stochasticity, handles irregular sparse sampling, and yields subject-specific trajectories. Experiments on synthetic benchmarks and real-world longitudinal neuroimaging datasets show that IMMFM outperforms existing methods in both forecasting accuracy and further downstream tasks.
2024
MIDL
Uncertainty-aware retinal layer segmentation in OCT through probabilistic signed distance functions
Islam, Mohammad Mohaiminul, Vente, Coen, Liefers, Bart, Klaver, Caroline, Bekkers, Erik J, and Sánchez, Clara I
In this paper, we present a new approach for uncertainty-aware retinal layer segmentation in Optical Coherence Tomography (OCT) scans using probabilistic signed distance functions (SDF). Traditional pixel-wise and regression-based methods primarily encounter difficulties in precise segmentation and lack of geometrical grounding respectively. To address these shortcomings, our methodology refines the segmentation by predicting a signed distance function (SDF) that effectively parameterizes the retinal layer shape via level set. We further enhance the framework by integrating probabilistic modeling, applying Gaussian distributions to encapsulate the uncertainty in the shape parameterization. This ensures a robust representation of the retinal layer morphology even in the presence of ambiguous input, imaging noise, and unreliable segmentations. Both quantitative and qualitative evaluations demonstrate superior performance when compared to other methods. Additionally, we conducted experiments on artificially distorted datasets with various noise types—shadowing, blinking, speckle, and motion—common in OCT scans to showcase the effectiveness of our uncertainty estimation. Our findings demonstrate the possibility of obtaining reliable segmentation of retinal layers, as well as an initial step towards the characterization of layer integrity, a key biomarker for disease progression.
TVST
Generalizable deep learning for the detection of incomplete and complete retinal pigment epithelium and outer retinal atrophy: a Macustar report
Vente, Coen, Valmaggia, Philippe, Hoyng, Carel B, Holz, Frank G, Islam, Mohammad M, Klaver, Caroline CW, Boon, Camiel JF, Schmitz-Valckenberg, Steffen, Tufail, Adnan, Saßmannshausen, Marlene, and others,
High anisotropy in volumetric medical images can lead to the inconsistent quantification of anatomical and pathological structures. Particularly in optical coherence tomography (OCT), slice spacing can substantially vary across and within datasets, studies, and clinical practices. We propose to standardize OCT volumes to less anisotropic volumes by conditioning 3D diffusion models with en face scanning laser ophthalmoscopy (SLO) imaging data, a 2D modality already commonly available in clinical practice. We trained and evaluated on data from the multicenter and multimodal MACUSTAR study. While upsampling the number of slices by a factor of 8, our method outperforms tricubic interpolation and diffusion models without en face conditioning in terms of perceptual similarity metrics. Qualitative results demonstrate improved coherence and structural similarity. Our approach allows for better informed generative decisions, potentially reducing hallucinations. We hope this work will provide the next step towards standardized high-quality volumetric imaging, enabling more consistent quantifications.
2022
MICCAI
Deep treatment response assessment and prediction of colorectal cancer liver metastases
Islam, Mohammad Mohaiminul, Badic, Bogdan, Aparicio, Thomas, Tougeron, David, Tasu, Jean-Pierre, Visvikis, Dimitris, and Conze, Pierre-Henri
In International Conference on Medical Image Computing and Computer-Assisted Intervention 2022
Evaluating treatment response is essential in patients who develop colorectal liver metastases to decide the necessity for second-line treatment or the admissibility for surgery. Currently, RECIST1.1 is the most widely used criteria in this context. However, it involves time-consuming, precise manual delineation and size measurement of main liver metastases from Computed Tomography (CT) images. Moreover, an early prediction of the treatment response given a specific chemotherapy regimen and the initial CT scan would be of tremendous use to clinicians. To overcome these challenges, this paper proposes a deep learning-based treatment response assessment pipeline and its extension for prediction purposes. Based on a newly designed 3D Siamese classification network, our method assigns a response group to patients given CT scans from two consecutive follow-ups during the treatment period. Further, we extended the network to predict the treatment response given only the image acquired at first time point. The pipelines are trained on the PRODIGE20 dataset collected from a phase-II multi-center clinical trial in colorectal cancer with liver metastases and exploit an in-house dataset to integrate metastases delineations derived from a U-Net inspired network as additional information. Our approach achieves overall accuracies of 94.94% and 86.86% for treatment response assessment and early prediction, respectively, suggesting that both treatment response assessment and prediction issues can be effectively solved with deep learning.
2021
ICEEICT
Interpreting and comparing convolutional neural networks: A quantitative approach
Islam, Mohammad Mohaiminul, and Tushar, Zahid Hassan
In 2021 5th International Conference on Electrical Engineering and Information Communication Technology (ICEEICT) 2021