I'm an AI researcher, machine learning engineer, and an M.Sc. Data Science graduate from EPFL in Switzerland.
I like to operate right on the boundary between theoretical machine learning and low-level systems engineering. The modern AI landscape is bottlenecked by compute, and I believe you can't fully understand a system until you can build the stack from scratch. Still, only optimizing kernels cannot take you far. That's why you'll find me figuring out loss scaling laws for quantization one day, and writing custom Triton sparse attention kernels the next.
Experience & Research
During my time at EPFL, I researched the optimization dynamics of LLMs within the Machine Learning and Optimization (MLO) Lab. I have also spent time bridging the gap between industry research and high-scale production:
- Apple (Machine Learning Research Intern): I led end-to-end quantization research in Cupertino. We derived novel loss scaling laws to predict optimal compute allocation between pretraining and quantization-aware training, enabling up to 50% compute savings in the extreme quantization regimes. This work was accepted to ICLR 2026.
- Yandex (ML Engineer & Researcher): I redesigned the real-time speech recognition model stack for the Alice Voice Assistant. By refining the architecture, we decreased response latency by 20% and cut inference costs by 50%, all while accelerating our internal experiment cycles from one month to just four days.
- MIPT: I graduated with my B.Sc. in Informatics & CS, earning a GPA of 9.21/10.0 and ranking in the top 1% of my class.
Things I've Built
I learn best by building from scratch. Here are a few open-source projects I'm particularly proud of:
- chill-attention: A fast, flexible sparse flash attention kernel written in pure Triton. It supports custom masking patterns and is designed to outperform naive PyTorch SDPA without the overhead of FlexAttention.
- optimus-dl: A modular, high-performance deep learning research framework.
- PyTorch Core: I occasionally contributed to the core PyTorch repository (mostly MPS backend) and write about the architecture I discover along the way. If you run LSTM on your Mac, you are running my code.
Writing & Learning in Public
I believe in demystifying complex things. On my blog, I write about AI, algorithms, and a bit of iOS development (my long gone past). Some of my favorite pieces include:

Let's Connect
I always enjoy meeting interesting people, collaborating on research, or just discussing the future of AI.
- Email: alex [at] alexdremov.me
- X/Twitter: @aldrmv
- GitHub: alexdremov
- LinkedIn: in/alexdremov
