Topic: DIE BIOS VENDORS, DIE!
This is a story about what our computers have to ingurgitate every time
we let them talk to the BIOS. The short story is: HUGE AMOUNTS OF CRAP!
The long story follows...
We call this the age of information and technology, the age of
communication. Well, we don't but marketers, politicians and all around
bullshiters do. Actually people are less technical now than they used to
be at the beginning of the computer age. They use technology, yes. But
they do just that: USE IT. Sure you can ride the bus, that doesn't make
you knowladgeable of its inner-workings.
Anyway, to get back on the subject, people tend to think that if they
take some computer science courses or attend/graduate such an University
then they're all knowing. Actually its quite the contrary nowadays. IN
REAL LIFE, most of the teachers and graduating students are incapable
of doing even the most simple tasks. Students come rushing at job
interviews with buzz words such as XML, encapsulation, Java(tm), OO,
templating, generalization, abstraction, cloud computing, AJAX... must I
go on?! But ask them to do the simplest of tasks, like reversing a
string and they fuck up. Why? They don't have access to something like:
System.IO::STDLIB.string.reverse.in-place()
at that point all their ``skills'' are gone and you can see a big black
hole forming in their heads.

``But they are cheap labor. They are. So what if they get it wrong?
They're 10 times cheaper than that weird 40-ish bearded developer that
we have!''
This would be the less stupid variant of the following:
``Come to think about it, if they're not cheap enough we can always
outsource. Yes, yes. So what if its a sensible part of hardware that we
need to get right, they can fix it with software... better yet, we can
sell them the fix afterwards and make them sign an NDA for it!"
Skip forward a few years. The product ends up on the market. They sell
it like its the best thing ever. Flawless. Optimized for Windows(c)
even. So it must be great.
I think that by this point you get the general idea.
So what actually did happen at h2k9? We started taking suspend/resume
code further and further. Laptops started suspending properly. So we
needed to resume them as well. Some did, some did not. But some acted
really weird, like halting, rebooting or just freezing randomly when we
got back from suspend.
We started fixing the almost working ones. Working, testing, poking the
hardware to see what would happen, the works. We got some results and
some bottlenecks as expected. As we got further and further people
started to join in.
If at first it was only me and mlarkin@, after a few days kettenis@ and
deraadt@ jumped in and then more and more people were joining, fixing the
drivers for suspend/resume, testing and adding bits and pieces to the
process. This meant a great speed-up for the suspend/resume effort.
But then, one dreaded night I decided to go back to those broken
machines and see what was happening there. Most of the other laptops got
pretty damn far except those.
It all started somewhere around 6pm. I plugged my serial cable and
started spilling printfs all over the suspend code path to see how
far I get. To my surprize the laptop suspended but wanted to resume
right away and instead it rebooted.
So I started putting printfs in the resume code path. The framework has
an unfold mechanism when suspend fails. This is a cool feature, thanks
to deraadt@, that saves you from a failed suspend by unrolling and
resuming the suspended devices.
Nothing showed up. Not a single line. This was bad. This meant that the
wakeup locore code was breaking it. This meant that things happened
before we switched to protected mode or very soon after. How do you
debug that? Well... print something, right? So I added some assembly to
print to the serial port. And it worked!
Great, that meant I can see where it panics. And it was just before
making the jump in protected mode. Yes...
By this time I got mlarkin@ in on the issue and he started looking into
it as well. Everybody, except dms@, went for beer already, but we were
still in the hackingroom looking into this. It was very strange. Finally
at about 2-3AM it hit us! The BIOS was trashing the GDT and the IDT!!!
This was incredible! The BIOS was screwing us over.
One hour later we had a fix that worked and got these laptops all the
way on resume. It was incredible! We could not belive it. We were so
happy we fixed it and so pissed at the BIOS vendors. We went for beer
and the bar was empty. Everyone went to sleep already. It was 3-4AM.
Great, so next day I get the okays and commit the patch. This fixed more
than one brand, so other computers started to go further. But after one
or two days I got some that did not just because of that patch. How
could that be? It just cleaned the GDT and IDT and reinitialized them,
it was supposed to be redundant and harmless on most laptops.
Some laptops did something else, they purposely messed with these in
order to get video back. So when we were cleaning them and doing our own
bios reset routine the BIOS wasn't getting what it expected and froze
the machine. Yes, its amazing.
So now we have a bunch of cases:
- some need the GDT/IDT clean-up
- some don't because they freeze
- some need the x86_emu part to emulate the BIOS video code
- some need a mix of the above
This is the crap they push down your throat when you pay them, this is
the thing that they don't make public and only give away for fun and
profit to your local friendly corporation.
THIS IS THE STUFF YOU GET FIXED FOR FREE FROM OPENBSD!
Enjoy your laptop, its been fixed. Stop buying the next one blindly.
Better have sane hardware than blinken-lights hardware. Be smart, don't
just shop for the pretty colors. And remember to support OpenBSD in any
way you can through donations or CDs.
gopher://sdf.lonestar.org/1/user/bulibuta