We present a GPU-accelerated pipeline for the Red Team track of the ICLR 2026 Re-Align Challenge. Starting from a diverse set of vision models selected via a vectorised Centered Kernel Alignment (CKA) matrix, we identify 1,000 images from the ImageNet validation split that produce maximally divergent representations across models. Divergence is scored via CKA-weighted pairwise cosine distances in a multi-scale randomised-PCA embedding space, and final image rankings are obtained by fusing scores across PCA scales through rank aggregation. The entire pipeline runs on a single H100 GPU.
The Re-Align Red Team challenge asks: which images cause the greatest disagreement in internal representations across vision models? Answering this requires both a scalable inter-model similarity metric and an efficient per-image disagreement score. We adopt Centered Kernel Alignment (CKA) as our similarity metric—consistent with the challenge evaluation protocol—and use it both to select a diverse reference set of models and to weight pairwise disagreements when scoring images.
For each model, activations are extracted from the designated layer using a forward hook registered via timm. Images are preprocessed with standard ImageNet normalisation (resize 256, centre-crop 224) and inference runs under torch.amp.autocast on a single H100. GPU memory is explicitly freed after each model. All images are drawn exclusively from the ImageNet validation split.
For CKA estimation, 2,048 images are subsampled. For divergence scoring, the full ImageNet validation set (~50k images) is used.
We use the unbiased HSIC estimator:
\[\text{HSIC}(K, L) = \frac{\operatorname{tr}(KL) + \dfrac{(\mathbf{1}^\top K \mathbf{1})(\mathbf{1}^\top L \mathbf{1})}{(n-1)(n-2)} - \dfrac{2\,(K\mathbf{1})^\top(L\mathbf{1})}{n-2}}{n(n-3)}\] \[\text{CKA}(K, L) = \frac{\text{HSIC}(K, L)}{\sqrt{\text{HSIC}(K,K)\cdot\text{HSIC}(L,L)}}\]All $K$ centred Gram matrices $G_i \in \mathbb{R}^{n \times n}$ are stacked into a single tensor $\mathbf{G} \in \mathbb{R}^{K \times n \times n}$. The three required pairwise quantities are computed in a fixed number of GPU kernel launches:
einsum("imn,jmn->ij", G, G)
A diverse reference set of 20 models is selected greedily from the full model suite to maximise collective representational dissimilarity. At each step the candidate with the lowest cumulative CKA to the already-selected set is added. A running vector cum_sim $\in \mathbb{R}^K$ is updated with a single $O(K)$ addition per step, yielding $O(K^2)$ total complexity. The seed is the model with the lowest mean-row CKA globally. This set is then used as the reference for all downstream divergence scoring.
Full-dataset features for each reference model are reduced to PCA embeddings at three scales $d \in {16, 32, 64}$ using a randomised SVD implemented in PyTorch, avoiding materialisation of the full $D \times D$ covariance matrix:
Peak GPU memory is $O(Nk + kD)$, typically 1–2 GB for ImageNet-scale features on the H100.
For a fixed scale $d$, all reference model embeddings are stacked into $\mathcal{V} \in \mathbb{R}^{M \times N \times d}$. For each image $x$:
\[\text{score}(x) = \sum_{i < j} w_{ij} \cdot \bigl(1 - \cos(\mathbf{v}_i(x),\, \mathbf{v}_j(x))\bigr)\]where $w_{ij}$ is the normalised CKA similarity between models $i$ and $j$. Upweighting high-CKA pairs means that images are ranked highly when they cause divergence among representationally similar models—disagreements that cannot be explained by architectural idiosyncrasy alone. Scoring is computed in chunked GPU einsum operations (512 images per chunk) with a single host-to-device transfer per scale.
Per-scale disagreement scores are converted to ranks, and the mean rank across the three PCA scales is used as the final score. This makes the ranking robust to scale-specific distributional differences without requiring commensurable score magnitudes.
From the top-5,000 candidates by fused rank score, 1,000 images are selected greedily by combining divergence score and diversity:
\[\text{combined}(x) = (1-\lambda)\,\hat{s}(x) + \lambda\,\hat{d}(x), \quad \lambda = 0.3\]where $\hat{s}$ is the min-max normalised divergence score and $\hat{d}$ is the min-max normalised minimum cosine distance to the already-selected set in a mean-model proxy feature space.
| Hyperparameter | Value |
|---|---|
| Hardware | 1× NVIDIA H100 (80 GB) |
| Image dataset | ImageNet validation split |
| Images for CKA estimation | 2,048 |
| Batch size | 256 |
| Reference models | 20 |
| PCA scales $d$ | 16, 32, 64 |
| GPU scoring chunk size | 512 |
| Diversity weight $\lambda$ | 0.3 |
| Candidate pool | Top-5,000 |
| Final images selected | 1,000 |
The greedy selection procedure produced a set of 20 models with a mean pairwise CKA of 0.087. Most pairwise similarities fall below 0.15, with only a small number of pairs exceeding 0.4, confirming low representational overlap across the selected set.
Figure 1 (top row) shows that the per-image disagreement distributions are approximately unimodal and centred near 0.5 across all three PCA scales, with the score range slightly expanding as $d$ increases. The Spearman $\rho$ between scale pairs (Figure 1, bottom centre) is low across all pairs, with only PCA-16 vs. PCA-64 reaching approximately 0.15 and the remaining pairs near zero. This low inter-scale agreement motivates the rank-fusion step: no single scale dominates, and the fused score integrates complementary signals from all three. The top-1000 threshold falls in the right tail of the fused distribution (Figure 1, bottom left). The top-1000 score curve by rank (Figure 1, bottom right) shows a smooth decay from ~47,000 to ~37,000, indicating no abrupt cliff between selected and excluded images.
Figure 1. (Top row) Per-image weighted cosine distance distributions for PCA-16, PCA-32, and PCA-64. (Bottom left) Fused rank score distribution with top-1000 threshold marked. (Bottom centre) Spearman $\rho$ between scale pairs. (Bottom right) Top-1000 fused score by rank.
Low inter-scale correlation. The near-zero Spearman $\rho$ between most scale pairs (Figure 1) shows that different PCA dimensionalities capture non-overlapping aspects of representational disagreement. This makes rank fusion essential rather than cosmetic—any single scale would leave signal on the table.
CKA-based pair weighting. Upweighting high-CKA pairs focuses the score on images that challenge models with otherwise similar representations. Such disagreements are more likely to reflect genuine ambiguity in the image content rather than incidental differences in model architecture.
Limitations. CKA is estimated on a 2,048-image subsample; larger samples would improve accuracy. The greedy final selection approximates the NP-hard maximum-diversity problem.
We describe a single-GPU pipeline for Red Team stimulus selection. A vectorised CKA matrix, randomised GPU PCA at three scales, CKA-weighted pairwise disagreement scoring, and rank fusion together identify 1,000 ImageNet validation images that maximise representational disagreement across a diverse reference set of vision models, all running on a single H100.
PLACEHOLDER FOR ACADEMIC ATTRIBUTION
BibTeX citation
PLACEHOLDER FOR BIBTEX