publications
publications by categories in reversed chronological order.
2025
- Rethinking Artistic Copyright Infringements In the Era Of Text-to-Image Generative ModelsMazda Moayeri, Sriram Balasubramanian, Samyadeep Basu, Priyatham Kattakinda, Atoosa Chegini, Robert Brauneis, and Soheil FeiziIn The Thirteenth International Conference on Learning Representations 2025
The advent of text-to-image generative models has led artists to worry that their individual styles may be copied, creating a pressing need to reconsider the lack of protection for artistic styles under copyright law. This requires answering challenging questions, like what defines style and what constitutes style infringment. In this work, we build on prior legal scholarship to develop an automatic and interpretable framework to quantitatively assess style infringement. Our methods hinge on a simple logical argument: if an artist’s works can consistently be recognized as their own, then they have a unique style. Based on this argument, we introduce ArtSavant, a practical (i.e., efficient and easy to understand) tool to (i) determine the unique style of an artist by comparing it to a reference corpus of works from hundreds of artists, and (ii) recognize if the identified style reappears in generated images. We then apply ArtSavant in an empirical study to quantify the prevalence of artistic style copying across 3 popular text-to-image generative models, finding that under simple prompting, 20 % of 372 prolific artists studied appear to have their styles be at risk of copying by today’s generative models. Our findings show that prior legal arguments can be operationalized in quantitative ways, towards more nuanced examination of the issue of artistic style infringements.
@inproceedings{moayeri2025rethinking, title = {Rethinking Artistic Copyright Infringements In the Era Of Text-to-Image Generative Models}, author = {Moayeri, Mazda and Balasubramanian, Sriram and Basu, Samyadeep and Kattakinda, Priyatham and Chegini, Atoosa and Brauneis, Robert and Feizi, Soheil}, booktitle = {The Thirteenth International Conference on Learning Representations}, year = {2025}, url = {https://openreview.net/forum?id=0OTVNEm9N4} } - A Survey on Mechanistic Interpretability for Multi-Modal Foundation ModelsZihao Lin, Samyadeep Basu, Mohammad Beigi, Varun Manjunatha, Ryan A. Rossi, Zichao Wang, Yufan Zhou, Sriram Balasubramanian, Arman Zarei, Keivan Rezaei, Ying Shen, Barry Menglong Yao, Zhiyang Xu, Qin Liu, Yuxiang Zhang, Yan Sun, Shilong Liu, Li Shen, Hongxuan Li, Soheil Feizi, and Lifu Huang2025
The rise of foundation models has transformed machine learning research, prompting efforts to uncover their inner workings and develop more efficient and reliable applications for better control. While significant progress has been made in interpreting Large Language Models (LLMs), multimodal foundation models (MMFMs) - such as contrastive vision-language models, generative vision-language models, and text-to-image models - pose unique interpretability challenges beyond unimodal frameworks. Despite initial studies, a substantial gap remains between the interpretability of LLMs and MMFMs. This survey explores two key aspects: (1) the adaptation of LLM interpretability methods to multimodal models and (2) understanding the mechanistic differences between unimodal language models and crossmodal systems. By systematically reviewing current MMFM analysis techniques, we propose a structured taxonomy of interpretability methods, compare insights across unimodal and multimodal architectures, and highlight critical research gaps.
@misc{lin2025surveymechanisticinterpretabilitymultimodal, title = {A Survey on Mechanistic Interpretability for Multi-Modal Foundation Models}, author = {Lin, Zihao and Basu, Samyadeep and Beigi, Mohammad and Manjunatha, Varun and Rossi, Ryan A. and Wang, Zichao and Zhou, Yufan and Balasubramanian, Sriram and Zarei, Arman and Rezaei, Keivan and Shen, Ying and Yao, Barry Menglong and Xu, Zhiyang and Liu, Qin and Zhang, Yuxiang and Sun, Yan and Liu, Shilong and Shen, Li and Li, Hongxuan and Feizi, Soheil and Huang, Lifu}, year = {2025}, eprint = {2502.17516}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, } - Seeing What’s Not There: Spurious Correlation in Multimodal LLMsParsa Hosseini, Sumit Nawathe, Mazda Moayeri, Sriram Balasubramanian, and Soheil Feizi2025
Unimodal vision models are known to rely on spurious correlations, but it remains unclear to what extent Multimodal Large Language Models (MLLMs) exhibit similar biases despite language supervision. In this paper, we investigate spurious bias in MLLMs and introduce SpurLens, a pipeline that leverages GPT-4 and open-set object detectors to automatically identify spurious visual cues without human supervision. Our findings reveal that spurious correlations cause two major failure modes in MLLMs: (1) over-reliance on spurious cues for object recognition, where removing these cues reduces accuracy, and (2) object hallucination, where spurious cues amplify the hallucination by over 10x. We validate our findings in various MLLMs and datasets. Beyond diagnosing these failures, we explore potential mitigation strategies, such as prompt ensembling and reasoning-based prompting, and conduct ablation studies to examine the root causes of spurious bias in MLLMs. By exposing the persistence of spurious correlations, our study calls for more rigorous evaluation methods and mitigation strategies to enhance the reliability of MLLMs.
@misc{hosseini2025seeingwhatstherespurious, title = {Seeing What's Not There: Spurious Correlation in Multimodal LLMs}, author = {Hosseini, Parsa and Nawathe, Sumit and Moayeri, Mazda and Balasubramanian, Sriram and Feizi, Soheil}, year = {2025}, eprint = {2503.08884}, archiveprefix = {arXiv}, primaryclass = {cs.CV}, } - Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop AnalysisAnushka Yadav, Isha Nalawade, Srujana Pillarichety, Yashwanth Babu, Reshmi Ghosh, Samyadeep Basu, Wenlong Zhao, Ali Nasaeh, Sriram Balasubramanian, and Soundararajan Srinivasan2025
The emergence of reasoning models and their integration into practical AI chat bots has led to breakthroughs in solving advanced math, deep search, and extractive question answering problems that requires a complex and multi-step thought process. Yet, a complete understanding of why these models hallucinate more than general purpose language models is missing. In this investigative study, we systematicallyexplore reasoning failures of contemporary language models on multi-hop question answering tasks. We introduce a novel, nuanced error categorization framework that examines failures across three critical dimensions: the diversity and uniqueness of source documents involved ("hops"), completeness in capturing relevant information ("coverage"), and cognitive inefficiency ("overthinking"). Through rigorous hu-man annotation, supported by complementary automated metrics, our exploration uncovers intricate error patterns often hidden by accuracy-centric evaluations. This investigative approach provides deeper insights into the cognitive limitations of current models and offers actionable guidance toward enhancing reasoning fidelity, transparency, and robustness in future language modeling efforts.
@misc{yadav2025hopskipoverthinkdiagnosing, title = {Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop Analysis}, author = {Yadav, Anushka and Nalawade, Isha and Pillarichety, Srujana and Babu, Yashwanth and Ghosh, Reshmi and Basu, Samyadeep and Zhao, Wenlong and Nasaeh, Ali and Balasubramanian, Sriram and Srinivasan, Soundararajan}, year = {2025}, eprint = {2508.04699}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, } - Decomposition-Enhanced Training for Post-Hoc Attributions In Language ModelsSriram Balasubramanian, Samyadeep Basu, Koustava Goswami, Ryan Rossi, Varun Manjunatha, Roshan Santhosh, Ruiyi Zhang, Soheil Feizi, and Nedim Lipka2025
Large language models (LLMs) are increasingly used for long-document question answering, where reliable attribution to sources is critical for trust. Existing post-hoc attribution methods work well for extractive QA but struggle in multi-hop, abstractive, and semi-extractive settings, where answers synthesize information across passages. To address these challenges, we argue that post-hoc attribution can be reframed as a reasoning problem, where answers are decomposed into constituent units, each tied to specific context. We first show that prompting models to generate such decompositions alongside attributions improves performance. Building on this, we introduce DecompTune, a post-training method that teaches models to produce answer decompositions as intermediate reasoning steps. We curate a diverse dataset of complex QA tasks, annotated with decompositions by a strong LLM, and post-train Qwen-2.5 (7B and 14B) using a two-stage SFT + GRPO pipeline with task-specific curated rewards. Across extensive experiments and ablations, DecompTune substantially improves attribution quality, outperforming prior methods and matching or exceeding state-of-the-art frontier models.
@misc{balasubramanian2025decompositionenhancedtrainingposthocattributions, title = {Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models}, author = {Balasubramanian, Sriram and Basu, Samyadeep and Goswami, Koustava and Rossi, Ryan and Manjunatha, Varun and Santhosh, Roshan and Zhang, Ruiyi and Feizi, Soheil and Lipka, Nedim}, year = {2025}, eprint = {2510.25766}, archiveprefix = {arXiv}, primaryclass = {cs.CL}, } - A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language ModelsSriram Balasubramanian, Samyadeep Basu, and Soheil FeiziIn Findings of the Association for Computational Linguistics: EMNLP 2025 Nov 2025
Chain-of-thought (CoT) reasoning enhances performance of large language models, but questions remain about whether these reasoning traces faithfully reflect the internal processes of the model. We present the first comprehensive study of CoT faithfulness in large vision-language models (LVLMs), investigating how both text-based and previously unexplored image-based biases affect reasoning and bias articulation. Our work introduces a novel, fine-grained evaluation pipeline for categorizing bias articulation patterns, enabling significantly more precise analysis of CoT reasoning than previous methods. This framework reveals critical distinctions in how models process and respond to different types of biases, providing new insights into LVLM CoT faithfulness. Our findings reveal that subtle image-based biases are rarely articulated compared to explicit text-based ones, even in models specialized for reasoning. Additionally, many models exhibit a previously unidentified phenomenon we term “inconsistent” reasoning - correctly reasoning before abruptly changing answers, serving as a potential canary for detecting biased reasoning from unfaithful CoTs. We then apply the same evaluation pipeline to revisit CoT faithfulness in LLMs across various levels of implicit cues. Our findings reveal that current language-only reasoning models continue to struggle with articulating cues that are not overtly stated.
@inproceedings{balasubramanian-etal-2025-closer, title = {A Closer Look at Bias and Chain-of-Thought Faithfulness of Large (Vision) Language Models}, author = {Balasubramanian, Sriram and Basu, Samyadeep and Feizi, Soheil}, editor = {Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet"}, booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2025}, month = nov, year = {2025}, address = {Suzhou, China}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.findings-emnlp.723/}, doi = {10.18653/v1/2025.findings-emnlp.723}, pages = {13406--13439}, isbn = {979-8-89176-335-7} } - Tool Preferences in Agentic LLMs are UnreliableKazem Faghih, Wenxiao Wang, Yize Cheng, Siddhant Bharti, Gaurang Sriramanan, Sriram Balasubramanian, Parsa Hosseini, and Soheil FeiziIn Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing Nov 2025
Large language models (LLMs) can now access a wide range of external tools, thanks to the Model Context Protocol (MCP). This greatly expands their abilities as various agents. However, LLMs rely entirely on the text descriptions of tools to decide which ones to use—a process that is surprisingly fragile. In this work, we expose a vulnerability in prevalent tool/function-calling protocols by investigating a series of edits to tool descriptions, some of which can drastically increase a tool’s usage from LLMs when competing with alternatives. Through controlled experiments, we show that tools with properly edited descriptions receive **over 10 times more usage** from GPT-4.1 and Qwen2.5-7B than tools with original descriptions. We further evaluate how various edits to tool descriptions perform when competing directly with one another and how these trends generalize or differ across a broader set of 17 different models. These phenomena, while giving developers a powerful way to promote their tools, underscore the need for a more reliable foundation for agentic LLMs to select and utilize tools and resources.
@inproceedings{faghih-etal-2025-tool, title = {Tool Preferences in Agentic {LLM}s are Unreliable}, author = {Faghih, Kazem and Wang, Wenxiao and Cheng, Yize and Bharti, Siddhant and Sriramanan, Gaurang and Balasubramanian, Sriram and Hosseini, Parsa and Feizi, Soheil}, editor = {Christodoulopoulos, Christos and Chakraborty, Tanmoy and Rose, Carolyn and Peng, Violet}, booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing}, month = nov, year = {2025}, address = {Suzhou, China}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2025.emnlp-main.1060/}, doi = {10.18653/v1/2025.emnlp-main.1060}, pages = {20965--20980}, isbn = {979-8-89176-332-6} } - Can AI-Generated Text be Reliably Detected? Stress Testing AI Text Detectors Under Various AttacksVinu Sankar Sadasivan, Aounon Kumar, Sriram Balasubramanian, Wenxiao Wang, and Soheil FeiziNov 2025
The rapid progress of Large Language Models (LLMs) has made them capable of performing astonishingly well on various tasks including document completion and question answering. The unregulated use of these models, however, can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Therefore, reliable detection of AI-generated text can be critical to ensure the responsible use of LLMs. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques that imprint specific patterns onto them. In this paper, both empirically and theoretically, we show that these detectors are not reliable in practical scenarios. Empirically, we show that paraphrasing attacks, where a light paraphraser is applied on top of the generative text model, can break a whole range of detectors, including the ones using the watermarking schemes as well as neural network-based detectors and zero-shot classifiers. We then provide a theoretical impossibility result indicating that for a sufficiently good language model, even the best-possible detector can only perform marginally better than a random classifier. Finally, we show that even LLMs protected by watermarking schemes can be vulnerable against spoofing attacks where adversarial humans can infer hidden watermarking signatures and add them to their generated text to be detected as text generated by the LLMs, potentially causing reputational damages to their developers. We believe these results can open an honest conversation in the community regarding the ethical and reliable use of AI-generated text.
@misc{sadasivan2023aigenerated, title = {Can {AI}-Generated Text be Reliably Detected? Stress Testing {AI} Text Detectors Under Various Attacks}, author = {Sadasivan, Vinu Sankar and Kumar, Aounon and Balasubramanian, Sriram and Wang, Wenxiao and Feizi, Soheil}, journal = {Transactions on Machine Learning Research}, issn = {2835-8856}, year = {2025}, url = {https://openreview.net/forum?id=OOgsAZdFOt} }
2024
- Decomposing and Interpreting Image Representations via Text in ViTs Beyond CLIPSriram Balasubramanian, Samyadeep Basu, and Soheil FeiziIn The Thirty-eighth Annual Conference on Neural Information Processing Systems Nov 2024
Recent work has explored how individual components of the CLIP-ViT model contribute to the final representation by leveraging the shared image-text representation space of CLIP. These components, such as attention heads and MLPs, have been shown to capture distinct image features like shape, color or texture. However, understanding the role of these components in arbitrary vision transformers (ViTs) is challenging. To this end, we introduce a general framework which can identify the roles of various components in ViTs beyond CLIP. Specifically, we (a) automate the decomposition of the final representation into contributions from different model components, and (b) linearly map these contributions to CLIP space to interpret them via text. Additionally, we introduce a novel scoring function to rank components by their importance with respect to specific features. Applying our framework to various ViT variants (e.g. DeiT, DINO, DINOv2, Swin, MaxViT), we gain insights into the roles of different components concerning particular image features. These insights facilitate applications such as image retrieval using text descriptions or reference images, visualizing token importance heatmaps, and mitigating spurious correlations. We release our code to reproduce the experiments in the paper.
@inproceedings{balasubramanian2024decomposing, title = {Decomposing and Interpreting Image Representations via Text in ViTs Beyond {CLIP}}, author = {Balasubramanian, Sriram and Basu, Samyadeep and Feizi, Soheil}, booktitle = {The Thirty-eighth Annual Conference on Neural Information Processing Systems}, year = {2024}, url = {https://openreview.net/forum?id=Vhh7ONtfvV} }
2023
- Towards Improved Input Masking for Convolutional Neural NetworksSriram Balasubramanian, and Soheil FeiziIn Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Oct 2023
The ability to remove features from the input of machine learning models is very important to understand and interpret model predictions. However, this is non-trivial for vision models since masking out parts of the input image and replacing them with a baseline color like black or grey typically causes large distribution shifts. Masking may even make the model focus on the masking patterns for its prediction rather than the unmasked portions of the image. In recent work, it has been shown that vision transformers are less affected by such issues as one can simply drop the tokens corresponding to the masked image portions. They are thus more easily interpretable using techniques like LIME which rely on input perturbation. Using the same intuition, we devise a masking technique for CNNs called layer masking, which simulates running the CNN on only the unmasked input. We find that our method is (i) much less disruptive to the model’s output and its intermediate activations, and (ii) much better than commonly used masking techniques for input perturbation based interpretability techniques like LIME. Thus, layer masking is able to close the interpretability gap between CNNs and transformers, and even make CNNs more interpretable in many cases.
@inproceedings{Balasubramanian_2023_ICCV, author = {Balasubramanian, Sriram and Feizi, Soheil}, title = {Towards Improved Input Masking for Convolutional Neural Networks}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, month = oct, year = {2023}, pages = {1855-1865} } - Exploring Geometry of Blind Spots in Vision modelsSriram Balasubramanian*, Gaurang Sriramanan*, Vinu Sankar Sadasivan, and Soheil FeiziIn Thirty-seventh Conference on Neural Information Processing Systems (Spotlight) Oct 2023
Despite the remarkable success of deep neural networks in a myriad of settings, several works have demonstrated their overwhelming sensitivity to near-imperceptible perturbations, known as adversarial attacks. On the other hand, prior works have also observed that deep networks can be under-sensitive, wherein large-magnitude perturbations in input space do not induce appreciable changes to network activations. In this work, we study in detail the phenomenon of under-sensitivity in vision models such as CNNs and Transformers, and present techniques to study the geometry and extent of “equi-confidence” level sets of such networks. We propose a Level Set Traversal algorithm that iteratively explores regions of high confidence with respect to the input space using orthogonal components of the local gradients. Given a source image, we use this algorithm to identify inputs that lie in the same equi-confidence level set as the source image despite being perceptually similar to arbitrary images from other classes. We further observe that the source image is linearly connected by a high-confidence path to these inputs, uncovering a star-like structure for level sets of deep networks. Furthermore, we attempt to identify and estimate the extent of these connected higher-dimensional regions over which the model maintains a high degree of confidence.
@inproceedings{balasubramanian2023exploring, title = {Exploring Geometry of Blind Spots in Vision models}, author = {Balasubramanian*, Sriram and Sriramanan*, Gaurang and Sadasivan, Vinu Sankar and Feizi, Soheil}, booktitle = {Thirty-seventh Conference on Neural Information Processing Systems (Spotlight)}, year = {2023}, url = {https://openreview.net/forum?id=uJ3qNIsDGF} } - Simulating Network Paths with Recurrent Buffering UnitsDivyam Anshumaan*, Sriram Balasubramanian*, Shubham Tiwari, Nagarajan Natarajan, Sundararajan Sellamanickam, and Venkata N. PadmanabhanProceedings of the AAAI Conference on Artificial Intelligence Jun 2023
Simulating physical network paths (e.g., Internet) is a cornerstone research problem in the emerging sub-field of AI-for-networking. We seek a model that generates end-to-end packet delay values in response to the time-varying load offered by a sender, which is typically a function of the previously output delays. The problem setting is unique, and renders the state-of-the-art text and time-series generative models inapplicable or ineffective. We formulate an ML problem at the intersection of dynamical systems, sequential decision making, and time-series modeling. We propose a novel grey-box approach to network simulation that embeds the semantics of physical network path in a new RNN-style model called Recurrent Buffering Unit, providing the interpretability of standard network simulator tools, the power of neural models, the efficiency of SGD-based techniques for learning, and yielding promising results on synthetic and real-world network traces.
@article{Anshumaan_Balasubramanian_Tiwari_Natarajan_Sellamanickam_Padmanabhan_2023, title = {Simulating Network Paths with Recurrent Buffering Units}, volume = {37}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/25820}, doi = {10.1609/aaai.v37i6.25820}, number = {6}, journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, author = {Anshumaan*, Divyam and Balasubramanian*, Sriram and Tiwari, Shubham and Natarajan, Nagarajan and Sellamanickam, Sundararajan and Padmanabhan, Venkata N.}, year = {2023}, month = jun, pages = {6684-6692} }
2020
- What’s in a Name? Are BERT Named Entity Representations just as Good for any other Name?Sriram Balasubramanian*, Naman Jain*, Gaurav Jindal*, Abhijeet Awasthi, and Sunita SarawagiIn Proceedings of the 5th Workshop on Representation Learning for NLP Jul 2020
We evaluate named entity representations of BERT-based NLP models by investigating their robustness to replacements from the same typed class in the input. We highlight that on several tasks while such perturbations are natural, state of the art trained models are surprisingly brittle. The brittleness continues even with the recent entity-aware BERT models. We also try to discern the cause of this non-robustness, considering factors such as tokenization and frequency of occurrence. Then we provide a simple method that ensembles predictions from multiple replacements while jointly modeling the uncertainty of type annotations and label predictions. Experiments on three NLP tasks shows that our method enhances robustness and increases accuracy on both natural and adversarial datasets.
@inproceedings{balasubramanian-etal-2020-whats, equal_authors = {3}, title = {What{'}s in a Name? Are {BERT} Named Entity Representations just as Good for any other Name?}, author = {Balasubramanian*, Sriram and Jain*, Naman and Jindal*, Gaurav and Awasthi, Abhijeet and Sarawagi, Sunita}, booktitle = {Proceedings of the 5th Workshop on Representation Learning for NLP}, month = jul, year = {2020}, address = {Online}, publisher = {Association for Computational Linguistics}, url = {https://aclanthology.org/2020.repl4nlp-1.24}, doi = {10.18653/v1/2020.repl4nlp-1.24}, pages = {205--214} }