Running DiffusionGemma on AMD Strix Halo and Decade-Old Tesla P40s
Google's experimental DiffusionGemma generates text by denoising 256-token blocks in parallel instead of predicting one token at a time. The official instructions assume an H100. I got it running on an AMD Strix Halo APU and a rack of four 2016-era Tesla P40s using an unmerged llama.cpp pull request — and the integrated AMD GPU beat all four NVIDIA cards combined by a factor of two.