Though AMD’s all-new Radeon 7900 series cards have been relatively well received by the gaming public, all the news about them post-launch has been surprisingly negative. Reports have already surfaced of broken shader prefetch hardware in some cards using early silicon, which AMD has said isn’t a thing, and now German site Hardwareluxx has received numerous reports of Radeon 7900 XTX cards overheating in its forums. There are apparently enough reports that it got AMD’s attention, as the company says it’s investigating the issue.
The problem only exists on the AMD-designed boards so far. Those are the black-and-red cards that are sold by AMD as well as some of its partners. What is happening is there’s a huge delta between the temperature of the main compute die and an adjacent hotspot. The delta is so large that it’s beyond the spec designed by AMD. This is causing GPUs to throttle thermally as the hotspot reaches over 100C. For example, on custom cards from Asus, XFX, and Sapphire the delta between temps isn’t bigger than 20C. However, on some of the AMD-designed boards, it’s as high as 53C. That means the Graphics Compute Die (GCD) is 56C, and the hotspot is 109C.
Though it’s unclear why only AMD’s boards are affected, there’s at least one theory. Given the GPU’s chiplet design, it’s possible there is uneven contact with all seven chiplets. As you recall there are six memory cache dies surrounding the main compute die. AMD doesn’t use a heat spreader on top, so all seven chiplets are in direct contact with the cold plate on the cooler. It might be possible that the chiplets are at a different z-height than the main die. Of course, AMD would have tried to ensure this was not the case, but when it comes to cooling a chip this small and complex, even tiny variances can have a big impact. AMD is able to sidestep this issue on its Ryzen chiplet CPUs by using a heat spreader.
Hardwareluxx has ruled out any kind of voltage fluctuation as all the boards are power limited, and can’t exceed 355W under any circumstances. Our sister site PCMag didn’t see any overheating in its test of the card. However, it did report that fan noise was apparent during testing. That could result from a hotspot causing the fans to spin faster than they should.
This issue is vaguely reminiscent of the issues surrounding Intel’s LGA 1700 socket. Its elongated design has led to some users reporting their coolers are applying uneven pressure. Intel has denied it will cause anything bad to happen, but contact frames have been made the rectify the issue. These frames typically lift the CPU up in the socket about 1mm, to give you an idea of the kind of tolerances at play here.
We’ll have to wait and see what AMD eventually says is causing the issue, and whether it’s something that’s only happening in Germany, or if it also affects US-based gamers as well. It’s a curious thing, as we don’t recall hearing anything about outrageous temperatures during the initial batch of online reviews. In general, people applauded the small and compact design of the GPUs, especially in relation to Nvidia’s latest cards. However, people did comment on the fan noise, which seemed excessive in some cases. However, since the issue only affects the AMD-designed cards, that points to a flaw in the design. Hopefully, AMD will clear the air on this issue shortly.