Understanding Flash Attention: Writing the Algorithm from Scratch in Triton
Why is Flash Attention so fast? Find out how Flash Attention works. Afterward, we'll polish our understanding by writing a GPU kernel of the algorithm in Triton.
Get the email newsletter and receive valuable tips to bump up your professional skills
It's all about making your models run faster, from flicking a magic “compile” switch to writing your own custom GPU code. In each step, we’ll implement an innocent softmax function, but things are about to get dark by the end.
Want to create dynamic and responsive user interfaces in SwiftUI? Data binding is the key! In this tutorial, I'll show you how to use @State, @ObservedObject, @EnvironmentObject, and @Binding to keep your user interface in sync with your data