Looking for arXiv endorsement (cs.CV) to post my ViT positional embeddings paper [R]
Hi everyone,
I'm looking for someone to endorse me for arXiv submission in cs.CV (computer vision) or cs.LG. I have a completed paper and want to upload it as a preprint.
About the paper:
Title: Positional Encodings in Vision Transformers: A Geometric Account of Spatial Organization and Robustness
Summary: This paper investigates how different positional encoding schemes (learned absolute, sinusoidal, and rotary) shape the internal representations of Vision Transformers. We introduce a metric called Spatial Similarity Distance Correlation (SSDC) to quantify spatial structure in token representations. Using controlled interventions (random permutation at inference, random permutation training, and positional magnitude scaling), we show that:
ViTs develop non‑trivial spatial structure even without positional embeddings, but this structure is content‑driven and collapses under token permutation.
All positional encodings shift models toward index‑anchored spatial organization that persists under content disruption.
Robustness to distributional shifts (JPEG compression, Gaussian blur) is primarily associated with the presence of a stable positional reference frame and correlates directly with SSDC as measured under intervention.
The paper includes experiments on ImageNet‑100 with ViT‑S models, multiple random seeds, and full statistical reporting.
PDF available at: https://github.com/mahmoud-mannes/neurips-geometry-paper/blob/main/paper/main.pdf
[link] [comments]
Want to read more?
Check out the full article on the original site