amdgpu / x570 / sea islands / IO_PAGE_FAULT
After turning on the amdgpu
driver on my new system
(radeon.si_support=0 amdgpu.si_support=1
) so I could
play with vulkan it seemed nice and stable.
But I was only playing around in emacs/netbeans/firefox and wasn't
doing much with the graphics system. When I ran a JavaFX thing
i've been playing with (the genetic art) and eventually it crashed
the graphics driver. So I went back to radeon
for
the time being until I decided to look at it again.
Anyway today I had another look, and whilst logged on remotely I got the error log:
[ 993.135454] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe05000 flags=0x0000] [ 993.135461] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe0a040 flags=0x0000] [ 993.135466] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe338c0 flags=0x0000] [ 993.135471] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe1d100 flags=0x0000] [ 993.135476] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe38900 flags=0x0000] [ 993.135481] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe5dd80 flags=0x0000] [ 993.135486] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe23580 flags=0x0000] [ 993.135491] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe25c00 flags=0x0000] [ 993.135496] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe4ed00 flags=0x0000] [ 993.135501] amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0003 address=0xffffe07a80 flags=0x0000] [ 993.135507] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe52a00 flags=0x0000] [ 993.135512] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe55200 flags=0x0000] [ 993.135517] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe5a200 flags=0x0000] [ 993.135521] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe5f2c0 flags=0x0000] [ 993.135526] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe10700 flags=0x0000] [ 993.135531] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe51600 flags=0x0000] [ 993.135535] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe606c0 flags=0x0000] [ 993.135540] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe4c7c0 flags=0x0000] [ 993.135544] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe53e00 flags=0x0000] [ 993.135549] AMD-Vi: Event logged [IO_PAGE_FAULT device=09:00.0 domain=0x0003 address=0xffffe58e00 flags=0x0000] [ 1003.712058] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=144954, emitted seq=144956 [ 1003.712116] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1301 thread Xorg:cs0 pid 1303 [ 1003.712118] [drm] GPU recovery disabled.
There wasn't a lot to be found on the internet
regarding amdgpu
but there were some other similar
problems. The suggested fix was to add iommu=soft
to the
kernel arguments. I'm trying this now and so far it looks good!
I have been tracking the 5.4.x kernel releases and thought that
might have something to do with it but fortunately not. When it
crashed initially i'd just done a slackpkg
upgrade-all
and all sorts of things broke horribly. But
that was my fault for overzealously removing packages that didn't
look important: polkit uses a lot of snot unfortunately so I broke
that. Fucking firefox lost all my settings though!