Computation and Language
☆ Charting and Navigating Hugging Face's Model Atlas
As there are now millions of publicly available neural networks, searching
and analyzing large model repositories becomes increasingly important.
Navigating so many models requires an atlas, but as most models are poorly
documented charting such an atlas is challenging. To explore the hidden
potential of model repositories, we chart a preliminary atlas representing the
documented fraction of Hugging Face. It provides stunning visualizations of the
model landscape and evolution. We demonstrate several applications of this
atlas including predicting model attributes (e.g., accuracy), and analyzing
trends in computer vision models. However, as the current atlas remains
incomplete, we propose a method for charting undocumented regions.
Specifically, we identify high-confidence structural priors based on dominant
real-world model training practices. Leveraging these priors, our approach
enables accurate mapping of previously undocumented areas of the atlas. We
publicly release our datasets, code, and interactive atlas.
☆ SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems
The rapid advancement of Large Multi-modal Models (LMMs) has enabled their
application in scientific problem-solving, yet their fine-grained capabilities
remain under-explored. In this paper, we introduce SciVerse, a multi-modal
scientific evaluation benchmark to thoroughly assess LMMs across 5,735 test
instances in five distinct versions. We aim to investigate three key dimensions
of LMMs: scientific knowledge comprehension, multi-modal content
interpretation, and Chain-of-Thought (CoT) reasoning. To unveil whether LMMs
possess sufficient scientific expertise, we first transform each problem into
three versions containing different levels of knowledge required for solving,
i.e., Knowledge-free, -lite, and -rich. Then, to explore how LMMs interpret
multi-modal scientific content, we annotate another two versions, i.e.,
Vision-rich and -only, marking more question information from texts to
diagrams. Comparing the results of different versions, SciVerse systematically
examines the professional knowledge stock and visual perception skills of LMMs
in scientific domains. In addition, to rigorously assess CoT reasoning, we
propose a new scientific CoT evaluation strategy, conducting a step-wise
assessment on knowledge and logical errors in model outputs. Our extensive
evaluation of different LMMs on SciVerse reveals critical limitations in their
scientific proficiency and provides new insights into future developments.
Project page: https://sciverse-cuhk.github.io
comment: Initially released in September 2024. Project page:
https://sciverse-cuhk.github.io
★ Transformers without Normalization CVPR 2025
Normalization layers are ubiquitous in modern neural networks and have long
been considered essential. This work demonstrates that Transformers without
normalization can achieve the same or better performance using a remarkably
simple technique. We introduce Dynamic Tanh (DyT), an element-wise operation
$DyT($x$) = \tanh(\alpha $x$)$, as a drop-in replacement for normalization
layers in Transformers. DyT is inspired by the observation that layer
normalization in Transformers often produces tanh-like, $S$-shaped input-output
mappings. By incorporating DyT, Transformers without normalization can match or
exceed the performance of their normalized counterparts, mostly without
hyperparameter tuning. We validate the effectiveness of Transformers with DyT
across diverse settings, ranging from recognition to generation, supervised to
self-supervised learning, and computer vision to language models. These
findings challenge the conventional understanding that normalization layers are
indispensable in modern neural networks, and offer new insights into their role
in deep networks.
comment: CVPR 2025; Project page: https://jiachenzhu.github.io/DyT/
☆ Siege: Autonomous Multi-Turn Jailbreaking of Large Language Models with Tree Search ICLR 2025
We introduce Siege, a multi-turn adversarial framework that models the
gradual erosion of Large Language Model (LLM) safety through a tree search
perspective. Unlike single-turn jailbreaks that rely on one meticulously
engineered prompt, Siege expands the conversation at each turn in a
breadth-first fashion, branching out multiple adversarial prompts that exploit
partial compliance from previous responses. By tracking these incremental
policy leaks and re-injecting them into subsequent queries, Siege reveals how
minor concessions can accumulate into fully disallowed outputs. Evaluations on
the JailbreakBench dataset show that Siege achieves a 100% success rate on
GPT-3.5-turbo and 97% on GPT-4 in a single multi-turn run, using fewer queries
than baselines such as Crescendo or GOAT. This tree search methodology offers
an in-depth view of how model safeguards degrade over successive dialogue
turns, underscoring the urgency of robust multi-turn testing procedures for
language models.
comment: Accepted to ICLR 2025 Trustworthy LLM
☆ From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke, Ben Peters, Sonal Sannigrahi, Anil Keshwani, Tsz Kin Lam, Bruno Martins, Marcely Zanon Boito, André F. T. Martins
Large language models (LLMs) have shown remarkable performance and
generalization capabilities across multiple languages and tasks, making them
very attractive targets for multi-modality integration (e.g., images or
speech). In this work, we extend an existing LLM to the speech modality via
speech discretization and continued pre-training. In particular, we are
interested in multilingual LLMs, such as TOWER, as their pre-training setting
allows us to treat discretized speech input as an additional translation
language. The resulting open-source model, SPIRE, is able to transcribe and
translate English speech input while maintaining TOWER's original performance
on translation-related tasks, showcasing that discretized speech input
integration as an additional language is feasible during LLM adaptation. We
make our code and models available to the community.
☆ Compositional Subspace Representation Fine-tuning for Adaptive Large Language Models ICLR 2025
Adapting large language models to multiple tasks can cause cross-skill
interference, where improvements for one skill degrade another. While methods
such as LoRA impose orthogonality constraints at the weight level, they do not
fully address interference in hidden-state representations. We propose
Compositional Subspace Representation Fine-tuning (CS-ReFT), a novel
representation-based approach that learns multiple orthonormal subspace
transformations, each specializing in a distinct skill, and composes them via a
lightweight router. By isolating these subspace edits in the hidden state,
rather than weight matrices, CS-ReFT prevents cross-task conflicts more
effectively. On the AlpacaEval benchmark, applying CS-ReFT to Llama-2-7B
achieves a 93.94% win rate, surpassing GPT-3.5 Turbo (86.30%) while requiring
only 0.0098% of model parameters. These findings show that specialized
representation edits, composed via a simple router, significantly enhance
multi-task instruction following with minimal overhead.
comment: Accepted to ICLR 2025 SCOPE
☆ TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan, Fei Kong, Hao Cheng, James Diffenderfer, Bhavya Kailkhura, Lichao Sun, Xiaofeng Zhu, Xiaoshuang Shi, Kaidi Xu
Object Hallucination (OH) has been acknowledged as one of the major
trustworthy challenges in Large Vision-Language Models (LVLMs). Recent
advancements in Large Language Models (LLMs) indicate that internal states,
such as hidden states, encode the "overall truthfulness" of generated
responses. However, it remains under-explored how internal states in LVLMs
function and whether they could serve as "per-token" hallucination indicators,
which is essential for mitigating OH. In this paper, we first conduct an
in-depth exploration of LVLM internal states in relation to OH issues and
discover that (1) LVLM internal states are high-specificity per-token
indicators of hallucination behaviors. Moreover, (2) different LVLMs encode
universal patterns of hallucinations in common latent subspaces, indicating
that there exist "generic truthful directions" shared by various LVLMs. Based
on these discoveries, we propose Truthful-Guided Pre-Intervention (TruthPrInt)
that first learns the truthful direction of LVLM decoding and then applies
truthful-guided inference-time intervention during LVLM decoding. We further
propose ComnHallu to enhance both cross-LVLM and cross-data hallucination
detection transferability by constructing and aligning hallucination latent
subspaces. We evaluate TruthPrInt in extensive experimental settings, including
in-domain and out-of-domain scenarios, over popular LVLMs and OH benchmarks.
Experimental results indicate that TruthPrInt significantly outperforms
state-of-the-art methods. Codes will be available at
https://github.com/jinhaoduan/TruthPrInt.
comment: 15 pages, 9 figures, the first two authors contributed equally
☆ VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Vision-Language Models have made significant progress on many
perception-focused tasks, however, their progress on reasoning-focused tasks
seem to be limited due to the lack of high-quality and diverse training data.
In this work, we aim to address the scarcity issue of reasoning-focused
multimodal datasets. We propose VisualWebInstruct - a novel approach that
leverages search engine to create a diverse, and high-quality dataset spanning
multiple disciplines like math, physics, finance, chemistry, etc. Starting with
meticulously selected 30,000 seed images, we employ Google Image search to
identify websites containing similar images. We collect and process the HTMLs
from over 700K unique URL sources. Through a pipeline of content extraction,
filtering and synthesis, we build a dataset of approximately 900K
question-answer pairs, with 40% being visual QA pairs and the rest as text QA
pairs. Models fine-tuned on VisualWebInstruct demonstrate significant
performance gains: (1) training from Llava-OV-mid shows 10-20% absolute point
gains across benchmarks, (2) training from MAmmoTH-VL shows 5% absoluate gain.
Our best model MAmmoTH-VL2 shows state-of-the-art performance within the 10B
parameter class on MMMU-Pro-std (40.7%), MathVerse (42.6%), and DynaMath
(55.7%). These remarkable results highlight the effectiveness of our dataset in
enhancing VLMs' reasoning capabilities for complex multimodal tasks.
comment: Technical Report
♻ ☆ Chain-of-Thought Reasoning In The Wild Is Not Always Faithful ICLR 25
Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, Arthur Conmy
Chain-of-Thought (CoT) reasoning has significantly advanced state-of-the-art
AI capabilities. However, recent studies have shown that CoT reasoning is not
always faithful, i.e. CoT reasoning does not always reflect how models arrive
at conclusions. So far, most of these studies have focused on unfaithfulness in
unnatural contexts where an explicit bias has been introduced. In contrast, we
show that unfaithful CoT can occur on realistic prompts with no artificial
bias. Our results reveal non-negligible rates of several forms of unfaithful
reasoning in frontier models: Sonnet 3.7 (16.3%), DeepSeek R1 (5.3%) and
ChatGPT-4o (7.0%) all answer a notable proportion of question pairs
unfaithfully. Specifically, we find that models rationalize their implicit
biases in answers to binary questions ("implicit post-hoc rationalization").
For example, when separately presented with the questions "Is X bigger than Y?"
and "Is Y bigger than X?", models sometimes produce superficially coherent
arguments to justify answering Yes to both questions or No to both questions,
despite such responses being logically contradictory. We also investigate
restoration errors (Dziri et al., 2023), where models make and then silently
correct errors in their reasoning, and unfaithful shortcuts, where models use
clearly illogical reasoning to simplify solving problems in Putnam questions (a
hard benchmark). Our findings raise challenges for AI safety work that relies
on monitoring CoT to detect undesired behavior.
comment: Accepted to the Reasoning and Planning for Large Language Models
Workshop (ICLR 25), 10 main paper pages, 38 appendix pages
♻ ☆ DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback ICLR 2025
The process of creating training data to teach models is currently driven by
humans, who manually analyze model weaknesses and plan how to create data that
improves a student model. Approaches using LLMs as annotators reduce human
effort, but still require humans to interpret feedback from evaluations and
control the LLM to produce data the student needs. Automating this
labor-intensive process by creating autonomous data generation agents - or
teachers - is desirable, but requires environments that can simulate the
feedback-driven, iterative, closed loop of data creation. To enable rapid,
scalable testing for such agents and their modules, we introduce DataEnvGym, a
testbed of teacher environments for data generation agents. DataEnvGym frames
data generation as a sequential decision-making task, involving an agent
consisting of a data generation policy (which generates a plan for creating
training data) and a data generation engine (which transforms the plan into
data), inside an environment that provides student feedback. The agent's goal
is to improve student performance. Students are iteratively trained and
evaluated on generated data, and their feedback (in the form of errors or weak
skills) is reported to the agent after each iteration. DataEnvGym includes
multiple teacher environment instantiations across 3 levels of structure in the
state representation and action space. More structured environments are based
on inferred skills and offer more interpretability and curriculum control. We
support 4 domains (math, code, VQA, and tool-use) and test multiple students
and teachers. Example agents in our teaching environments can iteratively
improve students across tasks and settings. Moreover, we show that environments
teach different skill levels and test variants of key modules, pointing to
future work in improving data generation agents, engines, and feedback
mechanisms.
comment: ICLR 2025 Spotlight; Project Page: https://DataEnvGym.github.io