<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>TinyComputers.io (Posts about ryzen ai)</title><link>https://tinycomputers.io/</link><description></description><atom:link href="https://tinycomputers.io/categories/ryzen-ai.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2026 A.C. Jokela 
&lt;!-- div style="width: 100%" --&gt;
&lt;a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"&gt;&lt;img alt="" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/80x15.png" /&gt; Creative Commons Attribution-ShareAlike&lt;/a&gt;&amp;nbsp;|&amp;nbsp;
&lt;!-- /div --&gt;
</copyright><lastBuildDate>Wed, 11 Mar 2026 00:05:51 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Upgrading ROCm 7.0 to 7.2 on AMD Strix Halo (gfx1151)</title><link>https://tinycomputers.io/posts/upgrading-rocm-7.0-to-7.2-on-amd-strix-halo-gfx1151.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/upgrading-rocm-7.0-to-7.2-on-amd-strix-halo-gfx1151_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;15 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;Introduction&lt;/h3&gt;
&lt;p&gt;If you're running AMD's Strix Halo hardware -- specifically the Ryzen AI MAX+ 395 with its integrated Radeon 8060S GPU -- you already know the software ecosystem is a moving target. The gfx1151 architecture sits in an awkward spot: powerful hardware that isn't officially listed on AMD's ROCm support matrix, yet functional enough to run real workloads with the right driver stack. When ROCm 7.2 landed in early 2026, upgrading from 7.0.2 was a priority. The newer stack brings an updated HSA runtime, a refreshed amdgpu kernel module, and broader compatibility improvements that matter on bleeding-edge silicon.&lt;/p&gt;
&lt;p&gt;This post documents the complete upgrade procedure from ROCm 7.0.2 to 7.2 on a production Ubuntu 24.04 system. It's not a theoretical exercise -- this was performed on a live server running QEMU virtual machines and network services, with the expectation that everything would come back online after a single reboot.&lt;/p&gt;
&lt;p&gt;AMD's official documentation states that in-place ROCm upgrades are not supported. The recommended path is a full uninstall followed by a clean reinstall. That's exactly what we did, and the entire process took about 20 minutes of wall-clock time (excluding the reboot).&lt;/p&gt;
&lt;h3&gt;System Overview&lt;/h3&gt;
&lt;p&gt;The target system is a &lt;a href="https://baud.rs/WZgnl1"&gt;Bosgame mini PC&lt;/a&gt; running the Ryzen AI MAX+ 395 APU. If you've read the &lt;a href="https://tinycomputers.io/posts/amd-ai-max+-395-system-review-a-comprehensive-analysis/"&gt;earlier review&lt;/a&gt; of this hardware, you'll be familiar with the specs. For context on this upgrade, here's what matters:&lt;/p&gt;
&lt;h4&gt;Hardware&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;CPU&lt;/strong&gt;: AMD Ryzen AI MAX+ 395, 16 cores / 32 threads, Zen 5&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU&lt;/strong&gt;: Integrated Radeon 8060S, 40 Compute Units, RDNA 3.5 (gfx1151)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory&lt;/strong&gt;: 32 GB DDR5, unified architecture with 96 GB allocatable to GPU&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Peak GPU Clock&lt;/strong&gt;: 2,900 MHz&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Software (Pre-Upgrade)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OS&lt;/strong&gt;: Ubuntu 24.04.3 LTS (Noble Numbat)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kernel&lt;/strong&gt;: 6.14.0-37-generic (HWE, pinned)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ROCm&lt;/strong&gt;: 7.0.2&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;amdgpu-dkms&lt;/strong&gt;: 6.14.14 (from &lt;code&gt;repo.radeon.com/amdgpu/30.10.2&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ROCk Module&lt;/strong&gt;: 6.14.14&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Running Services&lt;/h4&gt;
&lt;p&gt;The system was actively serving several roles during the upgrade:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Five QEMU virtual machines (three x86, two aarch64)&lt;/li&gt;
&lt;li&gt;A PXE boot server (dnsmasq) for the local network&lt;/li&gt;
&lt;li&gt;Docker daemon with various containers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;None of these services are tied to the GPU driver stack, so the plan was to perform the upgrade and reboot without shutting them down first. The VMs and network services would come back automatically after the reboot.&lt;/p&gt;
&lt;h3&gt;Why Upgrade&lt;/h3&gt;
&lt;p&gt;ROCm 7.0.2 worked on this hardware. Models loaded, inference ran, &lt;code&gt;rocminfo&lt;/code&gt; detected the GPU. So why bother upgrading?&lt;/p&gt;
&lt;p&gt;Three reasons:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Driver maturity for gfx1151&lt;/strong&gt;: The amdgpu kernel module jumped from 6.14.14 to 6.16.13 between the two releases. That's not a minor revision -- it represents months of kernel driver development. On hardware that isn't officially supported, newer drivers tend to bring meaningful stability improvements as AMD's internal teams encounter and fix issues on adjacent architectures.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;HSA Runtime improvements&lt;/strong&gt;: ROCm 7.2 ships HSA Runtime Extension version 1.15, up from 1.11 in ROCm 7.0.2. The HSA (Heterogeneous System Architecture) runtime is the lowest layer of the ROCm software stack -- it handles device discovery, memory management, and kernel dispatch. Improvements here affect everything built on top of it.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Ecosystem alignment&lt;/strong&gt;: PyTorch wheels, Ollama builds, and other ROCm-dependent tools increasingly target 7.2 as the baseline. Running 7.0.2 was becoming an exercise in version pinning and compatibility workarounds.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;The Kernel Hold: Why It Matters&lt;/h3&gt;
&lt;p&gt;Before diving into the procedure, a note on kernel management. This system runs the Ubuntu HWE (Hardware Enablement) kernel, which provides newer kernel versions on LTS releases. At the time of this upgrade, the HWE kernel was 6.14.0-37-generic. The upstream kernel had already moved to 6.17, but we didn't want the ROCm upgrade to pull in a kernel that AMD's DKMS module might not build against.&lt;/p&gt;
&lt;p&gt;The solution is &lt;code&gt;apt-mark hold&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;apt-mark&lt;span class="w"&gt; &lt;/span&gt;hold&lt;span class="w"&gt; &lt;/span&gt;linux-generic-hwe-24.04&lt;span class="w"&gt; &lt;/span&gt;linux-headers-generic-hwe-24.04&lt;span class="w"&gt; &lt;/span&gt;linux-image-generic-hwe-24.04
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This prevents &lt;code&gt;apt&lt;/code&gt; from upgrading the kernel meta-packages, effectively pinning the system to 6.14.0-37-generic. The hold was already in place before the upgrade and remained untouched throughout. After the upgrade, we confirmed it was still active:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;apt-mark&lt;span class="w"&gt; &lt;/span&gt;showhold
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;linux-generic-hwe-24.04
linux-headers-generic-hwe-24.04
linux-image-generic-hwe-24.04
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If you're running Strix Halo or any other hardware where kernel compatibility with &lt;code&gt;amdgpu-dkms&lt;/code&gt; is uncertain, kernel holds are essential. A kernel upgrade that breaks the DKMS build means no GPU driver after reboot.&lt;/p&gt;
&lt;h3&gt;Upgrade Procedure&lt;/h3&gt;
&lt;h4&gt;Step 1: Uninstall the Current ROCm Stack&lt;/h4&gt;
&lt;p&gt;AMD provides the &lt;code&gt;amdgpu-uninstall&lt;/code&gt; script for exactly this purpose. It removes all ROCm userspace packages and the amdgpu-dkms kernel module in a single operation:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;amdgpu-uninstall&lt;span class="w"&gt; &lt;/span&gt;-y
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This command removed approximately 120 packages, including the full HIP runtime, rocBLAS, MIOpen, MIGraphX, ROCm SMI, the LLVM-based compiler toolchain, and the Mesa graphics drivers that ship with ROCm. The DKMS module was purged, which means the amdgpu kernel module was removed from the 6.14.0-37-generic kernel's module tree.&lt;/p&gt;
&lt;p&gt;After the ROCm stack was removed, we purged the &lt;code&gt;amdgpu-install&lt;/code&gt; meta-package itself:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;purge&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;amdgpu-install
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This also cleaned up the APT repository entries that &lt;code&gt;amdgpu-install&lt;/code&gt; had configured in &lt;code&gt;/etc/apt/sources.list.d/&lt;/code&gt;. The old repos -- &lt;code&gt;repo.radeon.com/amdgpu/30.10.2&lt;/code&gt;, &lt;code&gt;repo.radeon.com/rocm/apt/7.0.2&lt;/code&gt;, and &lt;code&gt;repo.radeon.com/graphics/7.0.2&lt;/code&gt; -- were all removed automatically.&lt;/p&gt;
&lt;h4&gt;Step 2: Clean Up Leftover Files&lt;/h4&gt;
&lt;p&gt;The package removal was thorough but not perfect. A few leftover directories remained in &lt;code&gt;/opt/&lt;/code&gt;:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;ls&lt;span class="w"&gt; &lt;/span&gt;/opt/&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;rocm
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;rocm-7.0.0
rocm-7.0.2
rocm-7.9.0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;rocm-7.0.0&lt;/code&gt; directory was from a previous installation attempt. The &lt;code&gt;rocm-7.9.0&lt;/code&gt; was from an earlier experiment with a release candidate build. The &lt;code&gt;rocm-7.0.2&lt;/code&gt; directory contained a single orphaned shared library (&lt;code&gt;libamdhip64.so.6&lt;/code&gt;) that dpkg couldn't remove because the directory wasn't empty. All three were cleaned up manually:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;rm&lt;span class="w"&gt; &lt;/span&gt;-rf&lt;span class="w"&gt; &lt;/span&gt;/opt/rocm-7.0.0&lt;span class="w"&gt; &lt;/span&gt;/opt/rocm-7.0.2&lt;span class="w"&gt; &lt;/span&gt;/opt/rocm-7.9.0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;It's worth checking for stale ROCm directories after any uninstall. They consume negligible disk space but can confuse build systems and scripts that scan &lt;code&gt;/opt/rocm*&lt;/code&gt; for active installations.&lt;/p&gt;
&lt;h4&gt;Step 3: Install the ROCm 7.2 Installer&lt;/h4&gt;
&lt;p&gt;AMD distributes ROCm through a meta-package called &lt;code&gt;amdgpu-install&lt;/code&gt;. Each ROCm release has its own version of this package, which configures the appropriate APT repositories. The 7.2 installer was downloaded directly from AMD's repository:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nb"&gt;cd&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/tmp
wget&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/amdgpu-install/7.2/ubuntu/noble/amdgpu-install_7.2.70200-1_all.deb
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;./amdgpu-install_7.2.70200-1_all.deb
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;update
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After installation and &lt;code&gt;apt update&lt;/code&gt;, three new repositories were active:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;https://repo.radeon.com/amdgpu/30.30/ubuntu noble&lt;/code&gt; -- the kernel driver and Mesa components&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https://repo.radeon.com/rocm/apt/7.2 noble&lt;/code&gt; -- the ROCm userspace stack&lt;/li&gt;
&lt;li&gt;&lt;code&gt;https://repo.radeon.com/graphics/7.2/ubuntu noble&lt;/code&gt; -- graphics libraries&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The version numbering can be confusing. The &lt;code&gt;amdgpu-install&lt;/code&gt; package version is &lt;code&gt;30.30.0.0.30300000-2278356.24.04&lt;/code&gt;, which maps to the amdgpu driver release 30.30. The ROCm version is 7.2.0. These are different version tracks that AMD maintains in parallel.&lt;/p&gt;
&lt;h4&gt;Step 4: Install ROCm 7.2&lt;/h4&gt;
&lt;p&gt;With the repositories configured, the actual installation was a single command:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;amdgpu-install&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;--usecase&lt;span class="o"&gt;=&lt;/span&gt;graphics,rocm
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The &lt;code&gt;--usecase=graphics,rocm&lt;/code&gt; flag tells the installer to include both the Mesa graphics drivers and the full ROCm compute stack. This is the right choice for a system that needs both display output and GPU compute capabilities.&lt;/p&gt;
&lt;p&gt;The installation took approximately 10 minutes and included:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;amdgpu-dkms 6.16.13&lt;/strong&gt;: The kernel module, compiled via DKMS against the running kernel&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Full ROCm 7.2 stack&lt;/strong&gt;: HIP runtime, hipcc compiler, rocBLAS, rocFFT, MIOpen, MIGraphX, RCCL, ROCm SMI, ROCProfiler, and dozens of other libraries&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mesa graphics&lt;/strong&gt;: Updated EGL, OpenGL, and Vulkan drivers from the amdgpu Mesa fork&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ROCm LLVM toolchain&lt;/strong&gt;: The LLVM-based compiler infrastructure that HIP uses for kernel compilation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The DKMS build is the critical step. During installation, DKMS compiled the amdgpu module against the kernel headers for 6.14.0-37-generic. The output confirmed a successful build:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;depmod...
update-initramfs: Generating /boot/initrd.img-6.14.0-37-generic
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The initramfs was regenerated to include the new module, ensuring it would be loaded at boot.&lt;/p&gt;
&lt;h4&gt;Step 5: Verify DKMS&lt;/h4&gt;
&lt;p&gt;Before rebooting, we confirmed the DKMS status:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;dkms&lt;span class="w"&gt; &lt;/span&gt;status
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;amdgpu/6.16.13-2278356.24.04, 6.14.0-37-generic, x86_64: installed
virtualbox/7.0.16, 6.14.0-36-generic, x86_64: installed
virtualbox/7.0.16, 6.14.0-37-generic, x86_64: installed
virtualbox/7.0.16, 6.8.0-100-generic, x86_64: installed
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The new amdgpu module (6.16.13) was built and installed for 6.14.0-37-generic. Note that it only built for the currently running kernel, unlike VirtualBox which had modules built for older kernels as well. This is expected -- DKMS builds against available kernel headers, and the old kernel headers for 6.14.0-36 and 6.8.0-100 were still present from earlier installations.&lt;/p&gt;
&lt;h4&gt;Step 6: Reboot&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;reboot
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The server came back online in approximately 50 seconds.&lt;/p&gt;
&lt;h3&gt;Post-Reboot Verification&lt;/h3&gt;
&lt;h4&gt;rocminfo&lt;/h4&gt;
&lt;p&gt;The first check after reboot was &lt;code&gt;rocminfo&lt;/code&gt;, which queries the HSA runtime for available agents:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;rocminfo
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;ROCk&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;module&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;6.16&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;13&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loaded&lt;/span&gt;
&lt;span class="o"&gt;=====================&lt;/span&gt;
&lt;span class="n"&gt;HSA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;System&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Attributes&lt;/span&gt;
&lt;span class="o"&gt;=====================&lt;/span&gt;
&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="mf"&gt;1.18&lt;/span&gt;
&lt;span class="n"&gt;Runtime&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Ext&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="mf"&gt;1.15&lt;/span&gt;
&lt;span class="o"&gt;...&lt;/span&gt;
&lt;span class="o"&gt;==========&lt;/span&gt;
&lt;span class="n"&gt;HSA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Agents&lt;/span&gt;
&lt;span class="o"&gt;==========&lt;/span&gt;
&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RYZEN&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;395&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Radeon&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8060&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CPU&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;gfx1151&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Marketing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="n"&gt;AMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Radeon&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Graphics&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Compute&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Unit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="mi"&gt;40&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Max&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Clock&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Freq&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MHz&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="mi"&gt;2900&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;Memory&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="n"&gt;APU&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;ISA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amdgcn&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;amd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;amdhsa&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;gfx1151&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="n"&gt;ISA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amdgcn&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;amd&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;amdhsa&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;gfx11&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;generic&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Key observations:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;ROCk module 6.16.13&lt;/strong&gt;: The new kernel module loaded successfully.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Runtime Ext Version 1.15&lt;/strong&gt;: Upgraded from 1.11 in ROCm 7.0.2.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;gfx1151 detected&lt;/strong&gt;: The GPU was recognized with its correct ISA identifier.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;gfx11-generic ISA&lt;/strong&gt;: ROCm 7.2 also exposes a generic gfx11 ISA, which allows software compiled for the broader RDNA 3 family to run on this device without gfx1151-specific builds.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;APU memory&lt;/strong&gt;: The memory properties correctly identify this as an APU with unified memory.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;ROCm SMI&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;rocm-smi
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;Device  Node  Temp    Power     SCLK  MCLK     Fan  Perf  VRAM%  GPU%
0       1     33.0C   9.087W    N/A   1000Mhz  0%   auto  0%     0%
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The GPU was visible and reporting telemetry. The 0% VRAM reading is expected on an APU -- &lt;code&gt;rocm-smi&lt;/code&gt; reports dedicated VRAM usage, but on a unified memory architecture, GPU memory allocations come from system RAM and aren't reflected in this counter.&lt;/p&gt;
&lt;h4&gt;ROCm Version&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;cat&lt;span class="w"&gt; &lt;/span&gt;/opt/rocm/.info/version
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="mf"&gt;7.2.0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h4&gt;DKMS&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;dkms&lt;span class="w"&gt; &lt;/span&gt;status
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Confirmed &lt;code&gt;amdgpu/6.16.13&lt;/code&gt; remained installed for 6.14.0-37-generic after reboot.&lt;/p&gt;
&lt;h3&gt;PyTorch Validation&lt;/h3&gt;
&lt;p&gt;With the driver stack verified, the next step was confirming that PyTorch could see and use the GPU. ROCm 7.2 ships with prebuilt PyTorch wheels on AMD's repository.&lt;/p&gt;
&lt;h4&gt;Installing PyTorch for ROCm 7.2&lt;/h4&gt;
&lt;p&gt;We set up a Python virtual environment and installed the ROCm-specific wheels:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;python3&lt;span class="w"&gt; &lt;/span&gt;-m&lt;span class="w"&gt; &lt;/span&gt;venv&lt;span class="w"&gt; &lt;/span&gt;.venv
&lt;span class="nb"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;.venv/bin/activate
pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;--upgrade&lt;span class="w"&gt; &lt;/span&gt;pip
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The PyTorch wheel for ROCm 7.2 requires a matching ROCm-specific build of Triton. Both are available from AMD's manylinux repository. The order matters -- Triton must be installed first, since the PyTorch wheel declares it as a dependency with a specific version that doesn't exist on PyPI:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/triton-3.5.1%2Brocm7.2.0.gita272dfa8-cp312-cp312-linux_x86_64.whl
pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torch-2.9.1%2Brocm7.2.0.lw.git7e1940d4-cp312-cp312-linux_x86_64.whl
pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/rocm/manylinux/rocm-rel-7.2/torchvision-0.24.0%2Brocm7.2.0.gitb919bd0c-cp312-cp312-linux_x86_64.whl
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;These are the ROCm 7.2 builds for Python 3.12. AMD also provides wheels for Python 3.10, 3.11, and 3.13.&lt;/p&gt;
&lt;h4&gt;Smoke Test&lt;/h4&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="kn"&gt;import&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nn"&gt;torch&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"PyTorch:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__version__&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"CUDA available:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_available&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"Device:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_device_name&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"VRAM:"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cuda&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_device_properties&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;total_memory&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;1e9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="s2"&gt;"GB"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="n"&gt;PyTorch&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;2.9&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="n"&gt;rocm7&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;git7e1940d4&lt;/span&gt;
&lt;span class="n"&gt;CUDA&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;True&lt;/span&gt;
&lt;span class="n"&gt;Device&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AMD&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Radeon&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Graphics&lt;/span&gt;
&lt;span class="n"&gt;VRAM&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;103.1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GB&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;PyTorch detected the GPU through ROCm's HIP-to-CUDA translation layer. The 103.1 GB figure represents the total addressable memory on this unified-memory APU, which includes both the 96 GB GPU allocation and additional system memory accessible through the HSA runtime.&lt;/p&gt;
&lt;p&gt;Note the use of &lt;code&gt;torch.cuda&lt;/code&gt; despite this being an AMD GPU. ROCm's HIP runtime presents itself through PyTorch's CUDA interface, so all CUDA API calls in PyTorch (device selection, memory management, kernel launches) work transparently with AMD hardware.&lt;/p&gt;
&lt;h3&gt;Before and After Summary&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;ROCm 7.0.2&lt;/th&gt;
&lt;th&gt;ROCm 7.2.0&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ROCm Version&lt;/td&gt;
&lt;td&gt;7.0.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;7.2.0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;amdgpu-dkms&lt;/td&gt;
&lt;td&gt;6.14.14&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.16.13&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ROCk Module&lt;/td&gt;
&lt;td&gt;6.14.14&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;6.16.13&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;HSA Runtime Ext&lt;/td&gt;
&lt;td&gt;1.11&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.15&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;amdgpu Repo&lt;/td&gt;
&lt;td&gt;30.10.2&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;30.30&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PyTorch&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;2.9.1+rocm7.2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Triton&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;3.5.1+rocm7.2.0&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel&lt;/td&gt;
&lt;td&gt;6.14.0-37-generic&lt;/td&gt;
&lt;td&gt;6.14.0-37-generic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Kernel Holds&lt;/td&gt;
&lt;td&gt;In place&lt;/td&gt;
&lt;td&gt;In place&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Notes on gfx1151 Support&lt;/h3&gt;
&lt;p&gt;It's worth being explicit about the support situation. As of February 2026, gfx1151 (Strix Halo) is &lt;strong&gt;not listed&lt;/strong&gt; on AMD's official ROCm support matrix. The supported RDNA 3 targets are gfx1100 (Navi 31, RX 7900 XTX) and gfx1101 (Navi 32). Strix Halo's gfx1151 is an RDNA 3.5 derivative that shares much of the ISA with gfx1100 but has architectural differences in the memory subsystem and compute unit layout.&lt;/p&gt;
&lt;p&gt;In practice, ROCm 7.2 works on gfx1151. The kernel driver loads, &lt;code&gt;rocminfo&lt;/code&gt; detects the GPU, and PyTorch can allocate tensors and dispatch compute kernels. The &lt;code&gt;gfx11-generic&lt;/code&gt; ISA target in ROCm 7.2 is particularly helpful -- it provides a compatibility path for software that hasn't been explicitly compiled for gfx1151.&lt;/p&gt;
&lt;p&gt;However, "works" and "fully supported" are different things. There are known quirks:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;rocm-smi VRAM reporting&lt;/strong&gt;: Always shows 0% on the APU since it only tracks discrete VRAM&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No official PyTorch gfx1151 builds&lt;/strong&gt;: The ROCm PyTorch wheels target gfx1100. They run on gfx1151 through ISA compatibility, but performance may not be optimal&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Large model loading latency&lt;/strong&gt;: Moving large models to the GPU device can be slow on the unified memory architecture, as the HSA runtime handles page migration differently than discrete GPU DMA transfers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you're considering this hardware for production AI workloads, treat ROCm support as "functional but experimental." It works well enough for development, testing, and moderate inference workloads. For production training or latency-sensitive deployment, stick with hardware on AMD's official support list.&lt;/p&gt;
&lt;h3&gt;Rollback Plan&lt;/h3&gt;
&lt;p&gt;If the upgrade fails -- the DKMS module doesn't build, the GPU isn't detected after reboot, or something else goes wrong -- the rollback path is straightforward:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Uninstall ROCm 7.2:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;sudo&lt;span class="w"&gt; &lt;/span&gt;amdgpu-uninstall&lt;span class="w"&gt; &lt;/span&gt;-y
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;purge&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;amdgpu-install
&lt;/pre&gt;&lt;/div&gt;

&lt;ol&gt;
&lt;li&gt;Reinstall ROCm 7.0.2:&lt;/li&gt;
&lt;/ol&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;wget&lt;span class="w"&gt; &lt;/span&gt;https://repo.radeon.com/amdgpu-install/30.10.2/ubuntu/noble/amdgpu-install_30.10.2.0.30100200-2226257.24.04_all.deb
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;./amdgpu-install_30.10.2.0.30100200-2226257.24.04_all.deb
sudo&lt;span class="w"&gt; &lt;/span&gt;apt&lt;span class="w"&gt; &lt;/span&gt;update
sudo&lt;span class="w"&gt; &lt;/span&gt;amdgpu-install&lt;span class="w"&gt; &lt;/span&gt;-y&lt;span class="w"&gt; &lt;/span&gt;--usecase&lt;span class="o"&gt;=&lt;/span&gt;graphics,rocm
sudo&lt;span class="w"&gt; &lt;/span&gt;reboot
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The entire rollback takes about 15 minutes. Keep the old &lt;code&gt;amdgpu-install&lt;/code&gt; deb URL handy -- it's not linked from AMD's current download pages once a newer version is published.&lt;/p&gt;
&lt;h3&gt;Conclusion&lt;/h3&gt;
&lt;p&gt;Upgrading ROCm on hardware that isn't officially supported always carries some risk, but this upgrade from 7.0.2 to 7.2 on gfx1151 was uneventful. The procedure follows AMD's documented uninstall-reinstall approach with no deviations. The kernel hold strategy kept the kernel stable, the DKMS module built cleanly against 6.14.0-37-generic, and all post-reboot checks passed.&lt;/p&gt;
&lt;p&gt;The improvements in ROCm 7.2 -- particularly the HSA runtime bump to 1.15 and the introduction of the &lt;code&gt;gfx11-generic&lt;/code&gt; ISA target -- represent meaningful progress for Strix Halo users. The ecosystem is slowly catching up to the hardware. It's not there yet, but each release closes the gap.&lt;/p&gt;
&lt;p&gt;For anyone running a Ryzen AI MAX+ 395 or similar Strix Halo hardware on Ubuntu 24.04, this upgrade is worth doing. The procedure is well-defined, the rollback path is clear, and the newer driver stack brings tangible benefits. Just remember to hold your kernel first.&lt;/p&gt;
&lt;h3&gt;Recommended Resources&lt;/h3&gt;
&lt;h4&gt;Hardware&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/WZgnl1"&gt;Bosgame M5 AI Mini PC (Ryzen AI MAX+ 395)&lt;/a&gt; - The system used in this post&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/q87EAZ"&gt;GMKtec EVO X2 (Ryzen AI MAX+ 395)&lt;/a&gt; - Another Strix Halo mini PC option on Amazon&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Books&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://baud.rs/NTAPGg"&gt;&lt;em&gt;Deep Learning with PyTorch&lt;/em&gt;&lt;/a&gt; by Stevens, Antiga, Huang, Viehmann - Comprehensive guide to building, training, and tuning neural networks with PyTorch&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/Iu8KR4"&gt;&lt;em&gt;Programming PyTorch for Deep Learning&lt;/em&gt;&lt;/a&gt; by Ian Pointer - Practical guide to creating and deploying deep learning applications&lt;/li&gt;
&lt;li&gt;&lt;a href="https://baud.rs/zmKSQj"&gt;&lt;em&gt;Understanding Deep Learning&lt;/em&gt;&lt;/a&gt; by Simon Prince - Modern treatment of deep learning fundamentals&lt;/li&gt;
&lt;/ul&gt;</description><category>amd</category><category>amdgpu</category><category>dkms</category><category>driver upgrade</category><category>gfx1151</category><category>gpu computing</category><category>linux</category><category>pytorch</category><category>rocm</category><category>ryzen ai</category><category>strix halo</category><category>ubuntu</category><guid>https://tinycomputers.io/posts/upgrading-rocm-7.0-to-7.2-on-amd-strix-halo-gfx1151.html</guid><pubDate>Wed, 18 Feb 2026 16:00:00 GMT</pubDate></item><item><title>AMD AI Max+ 395 System Review: A Comprehensive Analysis</title><link>https://tinycomputers.io/posts/amd-ai-max%2B-395-system-review-a-comprehensive-analysis.html?utm_source=feed&amp;utm_medium=rss&amp;utm_campaign=rss</link><dc:creator>A.C. Jokela</dc:creator><description>&lt;div class="audio-widget"&gt;
&lt;div class="audio-widget-header"&gt;
&lt;span class="audio-widget-icon"&gt;🎧&lt;/span&gt;
&lt;span class="audio-widget-label"&gt;Listen to this article&lt;/span&gt;
&lt;/div&gt;
&lt;audio controls preload="metadata"&gt;
&lt;source src="https://tinycomputers.io/amd-ai-max+-395-system-review-a-comprehensive-analysis_tts.mp3" type="audio/mpeg"&gt;
&lt;/source&gt;&lt;/audio&gt;
&lt;div class="audio-widget-footer"&gt;29 min · AI-generated narration&lt;/div&gt;
&lt;/div&gt;

&lt;h3&gt;Executive Summary&lt;/h3&gt;
&lt;p&gt;The AMD AI Max+ 395 system represents AMD's latest entry into the high-performance computing and AI acceleration market, featuring the company's cutting-edge Strix Halo architecture. This comprehensive review examines the system's performance characteristics, software compatibility, and overall viability for AI workloads and general computing tasks. While the hardware shows impressive potential with its 16-core CPU and integrated Radeon 8060S graphics, significant software ecosystem challenges, particularly with PyTorch/ROCm compatibility for the gfx1151 architecture, present substantial barriers to immediate adoption for AI development workflows.&lt;/p&gt;
&lt;p&gt;&lt;img src="https://tinycomputers.io/images/IMG_3733.jpg" alt="AMD AI Max+ 395 Bosgame" style="float: left; width: 40%; margin: 0 20px 20px 0;"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Note: An Orange Pi 5 Max was photobombing this photograph&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;System Specifications and Architecture Overview&lt;/h3&gt;
&lt;h4&gt;CPU Specifications&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Processor&lt;/strong&gt;: AMD RYZEN AI MAX+ 395 w/ Radeon 8060S&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Architecture&lt;/strong&gt;: x86_64 with Zen 5 cores&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cores/Threads&lt;/strong&gt;: 16 cores / 32 threads&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Base Clock&lt;/strong&gt;: 599 MHz (minimum)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Boost Clock&lt;/strong&gt;: 5,185 MHz (maximum)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cache Configuration&lt;/strong&gt;:&lt;/li&gt;
&lt;li&gt;L1d Cache: 768 KiB (16 instances, 48 KiB per core)&lt;/li&gt;
&lt;li&gt;L1i Cache: 512 KiB (16 instances, 32 KiB per core)&lt;/li&gt;
&lt;li&gt;L2 Cache: 16 MiB (16 instances, 1 MiB per core)&lt;/li&gt;
&lt;li&gt;L3 Cache: 64 MiB (2 instances, 32 MiB per CCX)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Instruction Set Extensions&lt;/strong&gt;: Full AVX-512, AVX-VNNI, BF16 support&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Memory Subsystem&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Total System Memory&lt;/strong&gt;: 32 GB DDR5&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Configuration&lt;/strong&gt;: Unified memory architecture with shared GPU/CPU access&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Bandwidth&lt;/strong&gt;: Achieved ~13.5 GB/s in multi-threaded tests&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Graphics Processing Unit&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;GPU Architecture&lt;/strong&gt;: Strix Halo (RDNA 3.5 based)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;GPU Designation&lt;/strong&gt;: gfx1151&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute Units&lt;/strong&gt;: 40 CUs (80 reported in ROCm, likely accounting for dual SIMD per CU)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Peak GPU Clock&lt;/strong&gt;: 2,900 MHz&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;VRAM&lt;/strong&gt;: 96 GB shared system memory (103 GB total addressable) - &lt;em&gt;Note: This allocation was intentionally configured to maximize GPU memory for large language model inference&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Bandwidth&lt;/strong&gt;: Shared with system memory&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenCL Compute Units&lt;/strong&gt;: 20 (as reported by clinfo)&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Platform Details&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Operating System&lt;/strong&gt;: Ubuntu 24.04.3 LTS (Noble)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Kernel Version&lt;/strong&gt;: 6.8.0-83-generic&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Architecture&lt;/strong&gt;: x86_64&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Virtualization&lt;/strong&gt;: AMD-V enabled&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Performance Benchmarks&lt;/h3&gt;
&lt;p&gt;&lt;img alt="AMD AI Max+ 395 System Analysis Dashboard" src="https://tinycomputers.io/images/amd_system_analysis.png"&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Figure 1: Comprehensive performance analysis and compatibility overview of the AMD AI Max+ 395 system&lt;/em&gt;&lt;/p&gt;
&lt;h4&gt;CPU Performance Analysis&lt;/h4&gt;
&lt;h5&gt;Single-Threaded Performance&lt;/h5&gt;
&lt;p&gt;The sysbench CPU benchmark with prime number calculation revealed strong single-threaded performance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Events per second&lt;/strong&gt;: 6,368.92&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Average latency&lt;/strong&gt;: 0.16 ms&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;95th percentile latency&lt;/strong&gt;: 0.16 ms&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This performance places the AMD AI Max+ 395 in the upper tier of modern processors for single-threaded workloads, demonstrating the effectiveness of the Zen 5 architecture's IPC improvements and high boost clocks.&lt;/p&gt;
&lt;h5&gt;Multi-Threaded Performance&lt;/h5&gt;
&lt;p&gt;Multi-threaded testing across all 32 threads showed excellent scaling:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Events per second&lt;/strong&gt;: 103,690.35&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scaling efficiency&lt;/strong&gt;: 16.3x improvement over single-threaded (theoretical maximum 32x)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Thread fairness&lt;/strong&gt;: Excellent distribution with minimal standard deviation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The scaling efficiency of approximately 51% indicates good multi-threading performance, though there's room for optimization in workloads that can fully utilize all available threads.&lt;/p&gt;
&lt;h4&gt;Memory Performance&lt;/h4&gt;
&lt;h5&gt;Memory Bandwidth Testing&lt;/h5&gt;
&lt;p&gt;Memory performance testing using sysbench revealed:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Single-threaded bandwidth&lt;/strong&gt;: 9.3 GB/s&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Multi-threaded bandwidth&lt;/strong&gt;: 13.5 GB/s (16 threads)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Latency characteristics&lt;/strong&gt;: Sub-millisecond access times&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The memory bandwidth results suggest the system is well-balanced for most workloads, though AI applications requiring extremely high memory bandwidth may find this a limiting factor compared to discrete GPU solutions with dedicated VRAM.&lt;/p&gt;
&lt;h4&gt;GPU Performance and Capabilities&lt;/h4&gt;
&lt;h5&gt;Hardware Specifications&lt;/h5&gt;
&lt;p&gt;The integrated Radeon 8060S GPU presents impressive specifications on paper:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Architecture&lt;/strong&gt;: RDNA 3.5 (Strix Halo)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute Units&lt;/strong&gt;: 40 CUs with 2 SIMDs each&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Access&lt;/strong&gt;: Full 96 GB of shared system memory&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Clock Speed&lt;/strong&gt;: Up to 2.9 GHz&lt;/li&gt;
&lt;/ul&gt;
&lt;h5&gt;OpenCL Capabilities&lt;/h5&gt;
&lt;p&gt;OpenCL enumeration reveals solid compute capabilities:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Device Type&lt;/strong&gt;: GPU with full OpenCL 2.1 support&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Max Compute Units&lt;/strong&gt;: 20 (OpenCL reporting)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Max Work Group Size&lt;/strong&gt;: 256&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Image Support&lt;/strong&gt;: Full 2D/3D image processing capabilities&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Memory Allocation&lt;/strong&gt;: Up to 87 GB maximum allocation&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Network Performance Testing&lt;/h4&gt;
&lt;p&gt;Network infrastructure testing using iperf3 demonstrated excellent localhost performance:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Loopback Bandwidth&lt;/strong&gt;: 122 Gbits/sec sustained&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Latency&lt;/strong&gt;: Minimal retransmissions (0 retries)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Consistency&lt;/strong&gt;: Stable performance across 10-second test duration&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This indicates robust internal networking capabilities suitable for distributed computing scenarios and high-bandwidth data transfer requirements.&lt;/p&gt;
&lt;h3&gt;PyTorch/ROCm Compatibility Analysis&lt;/h3&gt;
&lt;h4&gt;Current State of ROCm Support&lt;/h4&gt;
&lt;p&gt;We installed ROCm 7.0 and related components:
- &lt;strong&gt;ROCm Version&lt;/strong&gt;: 7.0.0
- &lt;strong&gt;HIP Version&lt;/strong&gt;: 7.0.51831
- &lt;strong&gt;PyTorch Version&lt;/strong&gt;: 2.5.1+rocm6.2&lt;/p&gt;
&lt;h4&gt;gfx1151 Compatibility Issues&lt;/h4&gt;
&lt;p&gt;The most significant finding of this review centers on the gfx1151 architecture compatibility with current AI software stacks. Testing revealed critical limitations:&lt;/p&gt;
&lt;h5&gt;PyTorch Compatibility Problems&lt;/h5&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;rocBLAS error: Cannot read TensileLibrary.dat: Illegal seek for GPU arch : gfx1151
List of available TensileLibrary Files:
&lt;span class="k"&gt;-&lt;/span&gt; TensileLibrary_lazy_gfx1030.dat
&lt;span class="k"&gt;-&lt;/span&gt; TensileLibrary_lazy_gfx906.dat
&lt;span class="k"&gt;-&lt;/span&gt; TensileLibrary_lazy_gfx908.dat
&lt;span class="k"&gt;-&lt;/span&gt; TensileLibrary_lazy_gfx942.dat
&lt;span class="k"&gt;-&lt;/span&gt; TensileLibrary_lazy_gfx900.dat
&lt;span class="k"&gt;-&lt;/span&gt; TensileLibrary_lazy_gfx90a.dat
&lt;span class="k"&gt;-&lt;/span&gt; TensileLibrary_lazy_gfx1100.dat
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This error indicates that PyTorch's ROCm backend lacks pre-compiled optimized kernels for the gfx1151 architecture. The absence of gfx1151 in the TensileLibrary files means:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;No Optimized BLAS Operations&lt;/strong&gt;: Matrix multiplication, convolutions, and other fundamental AI operations cannot leverage GPU acceleration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Training Workflows Broken&lt;/strong&gt;: Most deep learning training pipelines will fail or fall back to CPU execution&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Inference Limitations&lt;/strong&gt;: Even basic neural network inference is compromised&lt;/li&gt;
&lt;/ol&gt;
&lt;h5&gt;Root Cause Analysis&lt;/h5&gt;
&lt;p&gt;The gfx1151 architecture represents a newer GPU design that hasn't been fully integrated into the ROCm software stack. While the hardware is detected and basic OpenCL operations function, the optimized compute libraries essential for AI workloads are missing.&lt;/p&gt;
&lt;h5&gt;Workaround Attempts&lt;/h5&gt;
&lt;p&gt;Testing various workarounds yielded limited success:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;HSA_OVERRIDE_GFX_VERSION=11.0.0&lt;/strong&gt;: Failed to resolve compatibility issues&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;CPU Fallback&lt;/strong&gt;: PyTorch operates normally on CPU, but defeats the purpose of GPU acceleration&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Basic GPU Operations&lt;/strong&gt;: Simple tensor allocation succeeds, but compute operations fail&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Software Ecosystem Gaps&lt;/h4&gt;
&lt;p&gt;Beyond PyTorch, the gfx1151 compatibility issues extend to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;TensorFlow&lt;/strong&gt;: Likely similar rocBLAS dependency issues&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;JAX&lt;/strong&gt;: ROCm backend compatibility uncertain&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scientific Computing&lt;/strong&gt;: NumPy/SciPy GPU acceleration unavailable&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Machine Learning Frameworks&lt;/strong&gt;: Most frameworks dependent on rocBLAS will encounter issues&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;AMD GPU Software Support Ecosystem Analysis&lt;/h3&gt;
&lt;h4&gt;Current State Assessment&lt;/h4&gt;
&lt;p&gt;AMD's GPU software ecosystem has made significant strides but remains fragmented compared to NVIDIA's CUDA platform:&lt;/p&gt;
&lt;h5&gt;Strengths&lt;/h5&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Open Source Foundation&lt;/strong&gt;: ROCm's open-source nature enables community contributions&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Standard API Support&lt;/strong&gt;: OpenCL 2.1 and HIP provide industry-standard interfaces&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Linux Integration&lt;/strong&gt;: Strong kernel-level support through AMDGPU drivers&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Professional Tools&lt;/strong&gt;: rocm-smi and related utilities provide comprehensive monitoring&lt;/li&gt;
&lt;/ol&gt;
&lt;h5&gt;Weaknesses&lt;/h5&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Fragmented Architecture Support&lt;/strong&gt;: New architectures like gfx1151 lag behind in software support&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Limited Documentation&lt;/strong&gt;: Less comprehensive than CUDA documentation&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Smaller Developer Community&lt;/strong&gt;: Fewer third-party tools and optimizations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compatibility Matrix Complexity&lt;/strong&gt;: Different software versions support different GPU architectures&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;Long-term Viability Concerns&lt;/h4&gt;
&lt;p&gt;The gfx1151 compatibility issues highlight broader ecosystem challenges:&lt;/p&gt;
&lt;h5&gt;Release Coordination Problems&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;Hardware releases outpace software ecosystem updates&lt;/li&gt;
&lt;li&gt;Critical libraries (rocBLAS, Tensile) require architecture-specific optimization&lt;/li&gt;
&lt;li&gt;Coordination between AMD hardware and software teams appears insufficient&lt;/li&gt;
&lt;/ul&gt;
&lt;h5&gt;Market Adoption Barriers&lt;/h5&gt;
&lt;ul&gt;
&lt;li&gt;Developers hesitant to adopt platform with uncertain software support&lt;/li&gt;
&lt;li&gt;Enterprise customers require guaranteed compatibility&lt;/li&gt;
&lt;li&gt;Academic researchers need stable, well-documented platforms&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Recommendations for AMD&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Accelerated Software Development&lt;/strong&gt;: Prioritize gfx1151 support in rocBLAS and related libraries&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Pre-release Testing&lt;/strong&gt;: Ensure software ecosystem readiness before hardware launches&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Better Documentation&lt;/strong&gt;: Comprehensive compatibility matrices and migration guides&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Community Engagement&lt;/strong&gt;: More responsive developer relations and support channels&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Network Infrastructure and Connectivity&lt;/h3&gt;
&lt;p&gt;The system demonstrates excellent network performance characteristics suitable for modern computing workloads:&lt;/p&gt;
&lt;h4&gt;Internal Performance&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Memory-to-Network Efficiency&lt;/strong&gt;: 122 Gbps loopback performance indicates minimal bottlenecks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;System Integration&lt;/strong&gt;: Unified memory architecture benefits network-intensive applications&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Scalability&lt;/strong&gt;: Architecture suitable for distributed computing scenarios&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;External Connectivity Assessment&lt;/h4&gt;
&lt;p&gt;While specific external network testing wasn't performed, the system's infrastructure suggests:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Support for high-speed Ethernet (2.5GbE+)&lt;/li&gt;
&lt;li&gt;Low-latency interconnects suitable for cluster computing&lt;/li&gt;
&lt;li&gt;Adequate bandwidth for data center deployment scenarios&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Power Efficiency and Thermal Characteristics&lt;/h3&gt;
&lt;p&gt;Limited thermal data was available during testing:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Idle Temperature&lt;/strong&gt;: 29°C (GPU sensor)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Idle Power&lt;/strong&gt;: 8.059W (GPU subsystem)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Thermal Management&lt;/strong&gt;: Appears well-controlled under light loads&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The unified architecture's power efficiency represents a significant advantage over discrete GPU solutions, particularly for mobile and edge computing applications.&lt;/p&gt;
&lt;h3&gt;Competitive Analysis&lt;/h3&gt;
&lt;h4&gt;Comparison with Intel Arc&lt;/h4&gt;
&lt;p&gt;Intel's Arc GPUs face similar software ecosystem challenges, though Intel has made more aggressive investments in AI software stack development. The Arc series benefits from Intel's deeper software engineering resources but still lags behind NVIDIA in AI framework support.&lt;/p&gt;
&lt;h4&gt;Comparison with NVIDIA&lt;/h4&gt;
&lt;p&gt;NVIDIA maintains a substantial advantage in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Software Maturity&lt;/strong&gt;: CUDA ecosystem is mature and well-supported&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI Framework Integration&lt;/strong&gt;: Native support across all major frameworks&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Developer Tools&lt;/strong&gt;: Comprehensive profiling and debugging tools&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Documentation&lt;/strong&gt;: Extensive, well-maintained documentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;AMD's advantages include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Open Source Approach&lt;/strong&gt;: More flexible licensing and community development&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Unified Memory&lt;/strong&gt;: Simplified programming model for certain applications&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cost&lt;/strong&gt;: Potentially more cost-effective solutions&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Market Positioning&lt;/h4&gt;
&lt;p&gt;The AMD AI Max+ 395 occupies a unique position as a high-performance integrated solution, but software limitations significantly impact its competitiveness in AI-focused markets.&lt;/p&gt;
&lt;h3&gt;Use Case Suitability Analysis&lt;/h3&gt;
&lt;h4&gt;Recommended Use Cases&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;General Computing&lt;/strong&gt;: Excellent performance for traditional computational workloads&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Development Platforms&lt;/strong&gt;: Strong for general software development (non-AI)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Edge Computing&lt;/strong&gt;: Unified architecture benefits power-constrained deployments&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Future AI Workloads&lt;/strong&gt;: When software ecosystem matures&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;Not Recommended For&lt;/h4&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Current AI Development&lt;/strong&gt;: gfx1151 compatibility issues are blocking&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Production AI Inference&lt;/strong&gt;: Unreliable software support&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Machine Learning Research&lt;/strong&gt;: Limited framework compatibility&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Time-Critical Projects&lt;/strong&gt;: Uncertain timeline for software fixes&lt;/li&gt;
&lt;/ol&gt;
&lt;h3&gt;Large Language Model Performance and Stability&lt;/h3&gt;
&lt;h4&gt;Ollama LLM Inference Testing&lt;/h4&gt;
&lt;p&gt;Testing with Ollama reveals a mixed picture for LLM inference on the AMD AI Max+ 395 system. The platform successfully runs various models through CPU-based inference, though GPU acceleration faces significant challenges.&lt;/p&gt;
&lt;h5&gt;Performance Metrics&lt;/h5&gt;
&lt;p&gt;Testing with various model sizes revealed the following performance characteristics:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;GPT-OSS 20B Model Performance:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prompt evaluation rate: 61.29 tokens/second&lt;/li&gt;
&lt;li&gt;Text generation rate: 8.99 tokens/second&lt;/li&gt;
&lt;li&gt;Total inference time: ~13 seconds for 117 tokens&lt;/li&gt;
&lt;li&gt;Memory utilization: ~54 GB VRAM usage&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Llama 4 (67B) Model:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Successfully loads and runs&lt;/li&gt;
&lt;li&gt;Generation coherent and accurate&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The system demonstrates adequate performance for smaller models (20B parameters and below) when running through Ollama, though performance significantly lags behind NVIDIA GPUs with proper CUDA acceleration. The large unified memory configuration (96 GB VRAM, deliberately maximized for this testing) allows loading of substantial models that would typically require multiple GPUs or extensive system RAM on other platforms. This conscious decision to allocate maximum memory to the GPU was specifically made to evaluate the system's potential for large language model workloads.&lt;/p&gt;
&lt;h4&gt;Critical Stability Issues with Large Models&lt;/h4&gt;
&lt;h5&gt;Driver Crashes with Advanced AI Workloads&lt;/h5&gt;
&lt;p&gt;Testing revealed severe stability issues when attempting to run larger models or when using AI-accelerated development tools:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Affected Scenarios:&lt;/strong&gt;&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Large Model Loading&lt;/strong&gt;: GPT-OSS 120B model causes immediate amdgpu driver crashes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI Development Tools&lt;/strong&gt;: Continue.dev with certain LLMs triggers GPU reset&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;OpenAI Codex Integration&lt;/strong&gt;: Consistent driver failures with models exceeding 70B parameters&lt;/li&gt;
&lt;/ol&gt;
&lt;h5&gt;GPU Reset Events&lt;/h5&gt;
&lt;p&gt;System logs reveal frequent GPU reset events during AI workload attempts:&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 1030.960155&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amdgpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0000&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="nl"&gt;c5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;00.0&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;amdgpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;begin&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 1033.972213&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amdgpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0000&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="nl"&gt;c5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;00.0&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;amdgpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;MODE2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reset&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 1034.002615&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amdgpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0000&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="nl"&gt;c5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;00.0&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;amdgpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reset&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;succeeded&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;trying&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;resume&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 1034.003141&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;drm&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;VRAM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;lost&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;due&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reset&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;
&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt; 1034.037824&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;amdgpu&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0000&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="nl"&gt;c5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="mf"&gt;00.0&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;amdgpu&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;GPU&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;succeeded&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;These crashes result in:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Complete loss of VRAM contents&lt;/li&gt;
&lt;li&gt;Application termination&lt;/li&gt;
&lt;li&gt;Potential system instability requiring reboot&lt;/li&gt;
&lt;li&gt;Interrupted workflows and data loss&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Root Cause Analysis&lt;/h4&gt;
&lt;p&gt;The driver instability appears to stem from the same underlying issue as the PyTorch/ROCm incompatibility: &lt;strong&gt;immature driver support for the gfx1151 architecture&lt;/strong&gt;. The drivers struggle with:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Memory Management&lt;/strong&gt;: Large model allocations exceed driver's tested parameters&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Compute Dispatch&lt;/strong&gt;: Complex kernel launches trigger unhandled edge cases&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Power State Transitions&lt;/strong&gt;: Rapid load changes cause driver state machine failures&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Synchronization Issues&lt;/strong&gt;: Multi-threaded inference workloads expose race conditions&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;Implications for AI Development&lt;/h4&gt;
&lt;p&gt;The combination of LLM testing results and driver stability issues reinforces that the AMD AI Max+ 395 system, despite impressive hardware specifications, remains unsuitable for production AI workloads. The platform shows promise for future AI applications once driver maturity improves, but current limitations include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Unreliable Large Model Support&lt;/strong&gt;: Models over 70B parameters risk system crashes&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Limited Tool Compatibility&lt;/strong&gt;: Popular AI development tools cause instability&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Workflow Interruptions&lt;/strong&gt;: Frequent crashes disrupt development productivity&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Data Loss Risk&lt;/strong&gt;: VRAM resets can lose unsaved work or model states&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Future Outlook and Development Roadmap&lt;/h3&gt;
&lt;h4&gt;Short-term Expectations (3-6 months)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;ROCm updates likely to address gfx1151 compatibility&lt;/li&gt;
&lt;li&gt;PyTorch/TensorFlow support should improve&lt;/li&gt;
&lt;li&gt;Community-driven workarounds may emerge&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Medium-term Prospects (6-18 months)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;Full AI framework support expected&lt;/li&gt;
&lt;li&gt;Optimization improvements for Strix Halo architecture&lt;/li&gt;
&lt;li&gt;Better documentation and developer resources&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Long-term Considerations (18+ months)&lt;/h4&gt;
&lt;ul&gt;
&lt;li&gt;AMD's commitment to open-source ecosystem should pay dividends&lt;/li&gt;
&lt;li&gt;Potential for superior price/performance ratios&lt;/li&gt;
&lt;li&gt;Growing developer community around ROCm platform&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;Conclusions and Recommendations&lt;/h3&gt;
&lt;p&gt;The AMD AI Max+ 395 system represents impressive hardware engineering with its unified memory architecture, strong CPU performance, and substantial GPU compute capabilities. However, critical software ecosystem gaps, particularly the gfx1151 compatibility issues with PyTorch and ROCm, severely limit its immediate utility for AI and machine learning workloads.&lt;/p&gt;
&lt;h4&gt;Key Findings Summary&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Hardware Strengths:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Excellent CPU performance with 16 Zen 5 cores&lt;/li&gt;
&lt;li&gt;Innovative unified memory architecture with 96 GB addressable&lt;/li&gt;
&lt;li&gt;Strong integrated GPU with 40 compute units&lt;/li&gt;
&lt;li&gt;Efficient power management and thermal characteristics&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Software Limitations:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Critical gfx1151 architecture support gaps in ROCm ecosystem&lt;/li&gt;
&lt;li&gt;PyTorch integration completely broken for GPU acceleration&lt;/li&gt;
&lt;li&gt;Limited AI framework compatibility across the board&lt;/li&gt;
&lt;li&gt;Insufficient documentation for troubleshooting&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;Market Position:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Competitive hardware specifications&lt;/li&gt;
&lt;li&gt;Unique integrated architecture advantages&lt;/li&gt;
&lt;li&gt;Significant software ecosystem disadvantages versus NVIDIA&lt;/li&gt;
&lt;li&gt;Uncertain timeline for compatibility improvements&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Purchasing Recommendations&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Buy If:&lt;/strong&gt;
- Primary use case is general computing or traditional HPC workloads
- Willing to wait 6-12 months for AI software ecosystem maturity
- Value open-source software development approach
- Need power-efficient integrated solution&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Avoid If:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Immediate AI/ML development requirements&lt;/li&gt;
&lt;li&gt;Production AI inference deployments planned&lt;/li&gt;
&lt;li&gt;Time-critical project timelines&lt;/li&gt;
&lt;li&gt;Require guaranteed software support&lt;/li&gt;
&lt;/ul&gt;
&lt;h4&gt;Final Verdict&lt;/h4&gt;
&lt;p&gt;The AMD AI Max+ 395 system shows tremendous promise as a unified computing platform, but premature software ecosystem development makes it unsuitable for current AI workloads. Organizations should monitor ROCm development progress closely, as this hardware could become highly competitive once software support matures. For general computing applications, the system offers excellent performance and value, representing AMD's continued progress in processor design and integration.&lt;/p&gt;
&lt;p&gt;The AMD AI Max+ 395 represents a glimpse into the future of integrated computing platforms, but early adopters should be prepared for software ecosystem growing pains. As AMD continues investing in ROCm development and the open-source community contributes solutions, this platform has the potential to become a compelling alternative to NVIDIA's ecosystem dominance.&lt;/p&gt;</description><category>ai hardware</category><category>amd</category><category>benchmarks</category><category>gfx1151</category><category>gpu computing</category><category>machine learning</category><category>pytorch</category><category>rocm</category><category>ryzen ai</category><category>strix halo</category><guid>https://tinycomputers.io/posts/amd-ai-max%2B-395-system-review-a-comprehensive-analysis.html</guid><pubDate>Sun, 21 Sep 2025 20:25:28 GMT</pubDate></item></channel></rss>