I had some serious Leopard stability problems recently. I thought I had found the problem but I was wrong then.
This time I’m pretty sure I’ve gotten rid of all the badness. I’ve had almost 2 days of continuous uptime, which may not sound like something to get excited about, but it sure beats the 4-6 hour average uptime I’ve had for the past few weeks.
I’ve spent so much time on this that I’m not willing to put in the extra 4-8 hours it would take to really truly isolate this issue. But my best guess is that it was the Canon drivers for my recently purchased Canon PIXMA MX310 multifunction printer. Specifically I think it’s the scanner driver, but I’m not certain. If so, that would mean that the culprits were BJUSBLoad.kext
and BJUSBMP.kext
in /System/Library/Extensions.
I got fed up a couple of days ago and started using sudo rm -rf
on anything that looked suspicious, based on the output of launchctl list
. That included completely removing a recently installed copy of VMWare Fusion, and everything that had been installed for my prior printer, which was an HP PSC 1600. I also re-disabled all my fonts, leaving only the required ones (see the prior post for info about that).
The clue came when I had to scan something this past weekend. I had already reinstalled Leopard on an external drive and brought it up to 10.5.2 with the Leopard Graphics Update 1.0, and it was fine. So, I was using the external drive’s clean Leopard install to get work done. (As a re-cap, Safe Boot mode on the internal hard drive had worked fine also. So I knew it had to be something not installed on the external drive yet, that was loaded only in non-Safe Boot mode.)
In order to scan a document I had to install the Canon drivers for the printer and scanner. A couple of hours after doing so, the system froze up. I hadn’t yet made the mental association between the Canon drivers and the subsequent crash, so I kept using the system for a while, even though it was crashing and requiring a reboot every few hours. At that point I decided to punt and go back to using my internal drive, since it was no worse now than the reinstall.
So, given that info, and the fact that I had never installed the HP drivers nor VMWare Fusion on the external drive, I’m pretty sure it was one of the Canon drivers. Otherwise there would have to be something in my user directory (which I copied over from the internal drive) which managed to start up something that was not installed on the active boot disk. That seems unlikely to me, but possible. In the launchd config files I’ve seen, the paths are hard-coded to where the installer puts things, so even though the internal drive was mounted, those things should have failed to start because the actual installed drivers and daemons are not where they were expected to be.
So, which Canon driver is it? Probably the scanner driver. From what I can tell, the printer driver is more or less an outgoing thing: try and print, and it looks on the USB bus for the printer, and prints to it. However, the scanner driver has to deal with the case where the user presses the “Scan” button on the printer’s control panel. If that happens it launches either Image Capture.app or the “MP Navigator EX” application which is Canon’s multifunction dashboard app. My suspicion is that this is what those two kexts are for.
So, if you get this kind of trouble, try blowing away those two kexts (wouldn’t it be grand if Canon provided an uninstall feature for their scanner driver?) and rebooting and see if that fixes it. Better yet, delete them one at a time and reboot each time, and see if you can figure out which one is really the problem.
Update: It was actually an intermittent hardware problem and Apple has replaced the logic board of this computer.
i almost commented on your last post but decided not as i had nothing to contribute but this: “i’m amazed at your tenacity and commend you”. however, after seeing the saga drag on, i have to say, i’m amazed at your tenacity and commend you.
Really? As far as I can tell I followed the path of least resistance given the info I had at the time. If I had done a wipe-and-reinstall that would have taken a couple of days in total to get all the crap I use installed again. I’m talking about reinstalling apps, MacPorts, etc. Forget about using the computer efficiently when it’s busy shoving little files all over the hard drive and forcing reboots every half hour. No way.
But in fact, I did “give up” more or less and reinstall to an external drive, but preserving my user profile, and copying over the Applications folder as-is using rsync (which only works on software that isn’t expecting to find a serial number buried somewhere secret). But I found that partway through reinstalling my essential apps (printer driver), the problem came back.
So a wipe and reinstall would have failed to eliminate the problem, while wasting a huge amount of time. Like I said, this was the lowest-effort approach I could think of.
It would be really, really helpful if Apple would actually document what Safe Boot vs. normal boot looks like, or even better, write a “skipping xxxx” log that shows what was NOT run.
Forgive my ignorance of OS X system-level programming, but I’m surprised to see stability problems coming from a USB drvier… don’t they have user mode USB drivers for async devices like this? (http://blogs.msdn.com/iliast/archive/2006/10/10/Introduction-to-the-User_2D00_Mode-Driver-Framework.aspx)
Does Apple make their class driver source or libs available so that writers can just modify the class driver? (thus lowering their odds of mucking up things like interrupt handling, dma, etc.)
Hmmmm…. it’s just strange given Apple’s emphasis on constrained hardware environments in order to provide greater user experience consistency
I have no idea how the USB drivers and kernel extensions work, and no desire to have to find out just to get work done. I use a Mac largely in order to avoid this sort of nonsense. Otherwise I’d buy a Dell and put Linux on it.
I re-enabled my disabled fonts and a few apps, and got a hang last night. So, at this point it’s hard to say what the issue is. I have some possible confounding going on, or maybe a combination of things that makes it act breaky. Currently 10 hours uptime with all non-Leopard-installed fonts disabled.
I played with the authorizations and f. up the system. I reseted them with the keychain but it remains a few problems inside Library/Extensions, whose BjusBload.kext and BjusBMP.
As I have a Canon MP830 i am going to reinstall that. Thank you for the tip !
BJ in BJUSB probably = Bubble Jet. Had the same problems….and just deleted the kext files.