⚡ STAN: Sparse adapTAtioN (SAE-based PEFT)

PEFT Sparse Autoencoder Interpretability Multi-Modal
Paper: The Chosen Few: Sparse Adaptation for Large Models

Overview

STAN (Sparse adapTAtioN) is a parameter-efficient fine-tuning method that replaces the rigid low-rank constraint (e.g., LoRA) with input-dependent sparse feature selection in a high-dimensional latent space. The key idea is to learn task-specific adaptations via sparse activations inside Sparse Autoencoder (SAE) modules, enabling both stronger expressivity and improved interpretability.

Why sparsity?
Instead of compressing updates into a fixed low-rank subspace, sparse adaptation identifies and selectively activates a small subset of high-dimensional features, supporting a more decomposed and dynamic fine-tuning process.

Architecture

STAN overview

Core Formulation

Similar to LoRA, STAN adds an adaptation term to a frozen pretrained projection. Different from LoRA, STAN computes the adaptation through an encoder–TopK sparsifier–decoder pipeline:

\[ \begin{aligned} \Delta \mathbf{W}\,\mathbf{x} &= \frac{1}{k}\,\mathbf{D}\,\operatorname{TopK}(\mathbf{E}\mathbf{x}), \\ \mathbf{h} &= \mathbf{W}_0\mathbf{x} + \frac{1}{k}\,\mathbf{D}\,\operatorname{TopK}(\mathbf{E}\mathbf{x}). \end{aligned} \]

Intuition: topk acts as a dynamic router that selects the most relevant latent features per input, yielding an input-dependent mixture of subspaces rather than a single fixed low-rank subspace.

Highlights

  • Conceptually distinct PEFT: sparse adaptation via SAEs rather than low-rank decomposition.
  • Dynamic & flexible updates: TopK-driven input-dependent feature selection.
  • Interpretable representations: sparse and more semantically decomposed latent features.
  • Broad validation: language understanding, math & code, vision-language, and diffusion-based generation.

Key Results (Selected)

GLUE (5 tasks, 4 backbones)

Backbone Method MNLI SST-2 QNLI QQP CoLA
RoBERTa-baseSTAN0.93030.94950.94080.92420.6191
RoBERTa-largeSTAN0.89190.96100.94890.89570.7400
DeBERTaV3-baseSTAN0.89740.96220.94770.92300.6904
DeBERTaV3-largeSTAN0.91450.96220.95900.90580.7528

STAN achieves strong performance across multiple backbones and tasks compared with common PEFT baselines.

DeBERTaV3-base (More baselines)

Method QNLI MNLI SST-2 QQP MRPC RTE STSB
LoRA0.93710.88570.94380.91630.89950.85200.9160
AdaLoRA0.94400.86370.95530.89520.90690.87360.9163
SoRA0.93220.80950.95640.85400.87340.87770.9222
STAN0.94770.89740.9622 0.92300.91660.91140.9277

Diffusion Style Alignment (SD3)

Beyond language and vision-language tasks, STAN also supports structured interventions in diffusion models, enabling multi-style alignment with strong quantitative and human evaluation results.

Metric STAN LoRA None
CLIP-Score0.66940.66450.6556
DINO-Score0.42830.42440.4134
Human win rate (STAN vs LoRA)91.02%

Efficiency (Example: LLaVA-1.5-13B)

Method Avg. inference latency / sample (s) Train tokens / sec GPU-hours (3 ep) Peak GPU mem (GB)
STAN0.4192359.420.0139.62
LoRA0.4041533.810.0233.53