Fixing random restarts on my late 2011 MacBook Pro


My constant love-hate relationship with Apple products has been going on since the last decade. My first Apple product was the (back then) ubiquitous 2006 white MacBook "Core Duo" 2.0. Man, that was a sweet machine. The first time I used it, it felt I was hacking on a cold marble slate with thoughtful design and an impeccable BSD operating system. Little I knew I was going to get addicted to that line of products, ranging from iPhones, to iPad and to more and more MacBooks.

The problem is that there is a price to pay. The extrinsic price which everyone knows - paying more than a grand. The intrinsic price being the constant turmoil with design flaws and stability issues.

I always likened a MacBook with a Fender Made in Mexico - if you manage to get a sweet tone loving MiM, good luck! Chances are, however, that you are not lucky at all, and end up with an instrument that needs continuous maintenance and attention due to poor QA processes.

Random Restarts

My latest battle was with a beautiful MacBook Pro 8,2 - for its time, as it was manufactured during late 2011, beginning 2012, (and I admit that this applies for nowadays as well), this was a heck and a beast of a machine. Donning the Sandy Bridge 2.4GHz quad core i7 processor, a 15.4" TFT screen and 8GB 1333 MHz DD3 RAM, it is a pleasure giving workhorse.

It had one serious design defect though, one that has led to the infamous RadeonGate - a class-action lawsuit which forced Apple to extend its repair extension program by two years. Suffice to say, Mr Murphy postulated a law for people like me, and this program is not available anymore.

So what is this serious design defect?

All MacBook Pros from 2011 have a design flaw whereby the thermal management and the generated heat together with the robustness of the discrete AMD graphics chips do not match up very well. Of course, Apple knew this and acted like a typical Soapy Smith, only reacting after the aforementioned class-action lawsuit.

So what?

This MacBook Pro has two graphics chips - an integrated  Intel and a discrete AMD Radeon X3000, the latter being defective. This AMD chip will heat up and cause the MacBook to restart randomly. If this chip goes bust, it makes the whole logic board unusable. With recent OSX updates, Apple wanted to keep the AMD chip alive by forcing more reboots to avoid TLC (total laptop combustion - ok sorry, I invented this acronym myself).

Now, get this, and please bear with me.

When I got this MacBook a couple of months ago, barely I knew what these random restarts where. I did everything, a systems engineer could possibly do - fsck the disk, re-installed O/S, kept the fans clean, checked all logic board solder joints, measure voltages at critical points in the logic board, etc. Until I started seeing something strange on the chip, which led me to this discovery.

Determining the problem actually took a month - due to my hectic schedule, I outsourced this process to two technicians, who are deemed to be the best in my country. I forked out EUR50 to get jack-shit fixed. One of them even had the audacity to tell me that his 'secret' magic sauce was installing a new firmware (a 1 minute job) and wanted to charge me EUR30 for that. Of course, I had already taken a stock information report before handing him my MacBook, and showed him that the latest firmware was already installed.

So with the bullshit being cut and diced, I decided to go head-first by myself, all alone.

The Solution(-ish)

Suffice to say - once you know the problem, you're half way to find the solution. To my disappointment, I found that the only real way to fix this problem was to replace the AMD chip alone. Not the logic board. Not "re-balling", not "reflowing". Apple replaced a failed chip with a failing chip. Time and time again. Only replacing the graphics chip is stilll a costly hardware procedure for such a vintage laptop. I am talking in the range of EUR500. FFS, with that amount of money, I would buy an entry level GPU to mine bitcoins!

So I decided to vest off from my electrical engineering mentality and start thinking from a software point of view. Can I fool BSD to ignore the graphic card? I know it can be done, back in my Linux hardcore days when I used to build my own custom kernel, I used to strip the kernel from all the bells and whistles that my hardware didn't need. So this was the approach I was determined to take.

Now let me make it clear - this blog / guide, assumes you are comfortable hacking the firmware and executing basic Linux commands. I will also assume that you kernel modules (kexts) are all in their default location (/System/Library/Extensions).

To get some display acceleration back it will be necessary to force the machine to not boot in discrete graphics (dGPU) but directly into integrated graphics (iGPU) and stay in this mode.

Booting into dGPU mode is the default on Macs with two switchable graphics cards. The procedure below will set an NVRAM variable that disables the dGPU and forces the system to only use the integrated Intel graphics even when booting.

The NVRAM variable is undocumented but appears to be universally applicable to all Macs with two switchable graphics cards. That means it should work on iMacs and MacBook Pros. Whether they have AMD or NVIDIA chips. The specifics about the drivers that might be necessary to move only cover AMD in this guide. But the NVRAM variable will bypass the discrete graphics chip in any case.

This will give you back your machine – but you will lose some features: e.g. the ability to drive an external display from the DisplayPort, a bit of 3D performance. Thunderbolt data connections should work.

In case this guide fails or is not wanted anymore: this procedure is pure software configuration and therefore fully reversible at any time with simple NVRAM reset.

Part 1: Bypassing the (failed) AMD Chip by disabling dGPU and move offending Kernel module

  1. To start from a clean slate: reset SMC and NVRAM: shutdown, unplug everything except power, now hold leftShift+Ctrl+Opt+Power and release all at the same time
  2. Now power on again and hold Cmd+Opt+p+r at the same time until you hear the startup chime two times
  3. Boot into Single User Recovery by holding Cmd+r+s 
  4. Disable SIP: enter: csrutil disable 
  5. Disable dGPU on boot with setting the following variable: nvram fa4ce28d-b62f-4c99-9cc3-6815686e30f9:gpu-power-prefs=%01 
  6. Enable verbose boot mode: nvram boot-args="-v" 
  7. Reboot into Single User-mode by holding Cmd+s on boot 
  8. Mount root partition writeable /sbin/mount -uw / 
  9. Make a kext-backup directory mkdir -p /System/Library/Extensions-off 
  10. Only move ONE offending kext out of the way: mv /System/Library/Extensions/AMDRadeonX3000.kext /System/Library/Extensions-off/ 
  11. Inform the system to update its kextcache: touch /System/Library/Extensions/ 
  12. Reboot normally
You should now have an iGPU accelerated display, but the system doesn't know how to power-management the failed AMD-chip. (In this state the GPU is always idling with relatively high power, consuming quite a bit of battery when unplugged and leading to GPU temperatures from 60°C upwards [on average 60-85°C], despite not being used for anything by system.)

Part 2: Improve Thermal and Power Management

For improved power management of the disabled GPU you have to either manually load the one crucial kext after boot by:

sudo kextload /System/Library/Extensions-off/AMDRadeonX3000.kext 

If you have a temperature sensor application you might want to have it open before issuing the above command and watch the temps drop…

Automate this with the following LoginHook that will get executed after the next reboot:

sudo mkdir -p /Library/LoginHook 
sudo vim /Library/LoginHook/ 

with the following content:

kextload /System/Library/Extensions-off/AMDRadeonX3000.kext 
pmset -a force gpuswitch 0 # undocumented/experimental 
exit 0

then make it executable and active:

sudo chmod a+x /Library/LoginHook/ 
sudo defaults write LoginHook /Library/LoginHook/

For proper power management the minimal set of loaded kexts are on boot (versions for 10.12.6, check with kextstat | grep AMD): (1.5.1) (1.5.1) (1.5.1) (1.5.1) 

And if the above method of loading succeeded this should appear added to the list: (1.5.1)

Is that all?

If you managed to read this far, holy cow!

I tested this solution by looking at the OSX preferences and making sure that the MacBook was always using the integrated Intel graphics card (and not switching to other the AMD discreet card).

I also made use of a free software called Mac Fans Control. This has given me the ability to measure the temperature of the logic board at various places - each CPU core, the memory slots, and most importantly, the graphic chips.

The temperature was now cooler, dropped by at least 20 degrees!

However, the inevitable happened. The restarts happened again after a few less than an hour. I was pissed off at this stage. Then I remembered about Murphy's Law again - If there is a possibility of several things going wrong, the one that will cause the most damage will be the one to go wrong. But that doesn't exclude that other things are also going wrong in the background.

So I started looking at Kernel logs, trying to eliminate various variables and postulated that there must be something at the disk level as well which is triggering restarts. Lo and behold, I did a disk check and found out that all these restarts that have been caused by the AMD chipset, have also screwed up my filesystem. Nothing that an fsck -iy didn't fix (make sure you are in single-user mode).

My MacBook hasn't restarted in the last 72 hours, and to be sure, I did all sorts of activities to work out its Intel graphic chip, disk and memory - and making it sweat to the point of no return. I downloaded about 30GB of data, played 3D accelerated games, and abused the disk by bucketing random disk sectors to the null device using dd.

Downloaded and playing Dungeons & Dragons Online

How to get an External display working now?

The million dollar question. Those Apple engineers of course decided to do a direct connection bus from the AMD chip to the external display port, making the latter completely unusable now!

After some research I discovered a nifty USB-to-VGA adapter that works successfully with this range of MacBook that are equipped with only USB 2.0 ports (ergo no USB-C or USB 3.0).  The Diamond Multimedia USB 2.0 to VGA adapter supports high resolutions and have had very good reviews. In fact, I just ordered mine a couple of days ago!

With that said, I hope this guide relieves you from my same ordeal!

No comments: