Pushing the Limits of Sparse Attention: From Theory to Practical Efficiency

Tuesday 8 April 2025

Starts 13:00 PM

Finishes 14:00 PM

Organized by Priberam Labs

Venue: Instituto Superior Técnico, Anfiteatro PA2

Address: 1 Avenida Rovisco Pais
1049-001 Lisboa

Copy this link to share the event with anyone:

Share to social media:

About this event

Priberam Machine Learning Lunch Seminar

Abstract:

Adaptive sparse attention mechanisms have emerged as a powerful alternative to dense attention in transformers, offering more interpretability for sequence modeling. Despite this, their widespread adoption has been limited by computational inefficiencies and insufficient understanding of their theoretical properties compared to dense attention models. In this talk, I will present recent advancements in adaptive sparse attention, exploring its expressivity, generalization ability, and hardware-aware optimizations. First, I’ll explore the expressivity of sparsemax attention, showing how it relates to linear attention with selective updates, and why entmax with α=1.5 offers even greater expressive power. Second, I’ll discuss our findings on generalization capabilities, where sparse attention shows superior performance on longer sequences compared to dense attention, particularly when considering an appropriate scaling. Finally, I’ll introduce AdaSplash, our hardware-aware implementation of α-entmax attention that outperforms FlashAttention-2 at high levels of sparsity. Throughout the talk, I’ll highlight how these advances collectively establish adaptive sparse attention as a robust alternative that can redefine the landscape of long sequence modeling.

Bio:

Marcos Treviso is a Postdoctoral Researcher at Instituto de Telecomunicações where he focuses on advancing sparse attention mechanisms for natural language processing. His research spans theoretical analysis of sparse attention expressivity, generalization capabilities to longer contexts, and hardware-efficient implementations. His recent work includes theoretical connections between sparsemax attention and linear attention, studies on sparse attention’s superior generalization to longer sequence lengths, and hardware-aware optimizations for efficient transformers. Marcos earned his Ph.D. with Distinction and Honour from IST, University of Lisbon, under the supervision of Prof. André Martins. He serves as a Reviewer, Area Chair and Senior Area Chair at major NLP conferences, including ACL, helping drive research in efficient language processing techniques.

www.priberam.com

This page last updated Sunday 30 March 2025 at 23:57.

Problems? Report an error or inappropriate listing here.

Information displayed here is provided in good faith but we are not responsible for the content of any listing. Sometimes events can be cancelled or changed at short notice. Please check with the venue or organizer before you travel!