Borkopolis

April 4, 2011

My first professional bug

Filed under: 20-minute,history,programming — Mark Dalrymple @ 12:10 am
Tags: , ,

Clearning with rooking grass

Mike Ash’s recent Friday Q&A mentioned SIGWINCH, the hearing of which always sends me down memory lane.  My first professional bug was centered around SIGWINCH.  By “professional bug”, I mean a bug that someone paid me actual money to fix during a period of employment.

Straight out of college in 1990 I went to work for a company called Visix, which at the time sold a product called Looking Glass, a file browser much like the Macintosh Finder but for Unix.  Looking Glass supported the major graphical windowing systems of the time: X11, Intergraph’s Environ V, and Sun’s SunView.  The image at the top of this posting is the only screen shot I could find of the version of Looking Glass I worked on running on SunView.

I was hired into tech support, and our team’s duties were phone support (typically debugging network configurations and X server font paths) and porting Looking Glass to other platforms.  Being the Lo Mein on the totem pole, I got given the old platform nobody wanted to touch any more: SunView.

SunOS 4.1.X had just come out, and Looking Glass would hang randomly.  It worked fine on 4.0.3.  My job was to find and fix this hang.  This was my first introduction to a lot of things: unix systems, windowing systems, navigating large code bases, debuggers, and vendor documentation that wasn’t Apple.  Luckily the SunView version didn’t sell terribly well any more because everyone was moving to X11, but there were a couple of customers bitten by this problem.

So what is SunView?  SunView is a windowing system.  Different programs run displaying graphical output into a window.  Nowadays that’s common place, but back when SunView came out it was pretty cool.  SunView was one of the earlier windowing systems, so it had a number of peculiarities: the biggest was that each window on the screen was represented by an honest-to-god device.  /dev/wnd5 is a window, as would be /dev/wnd12.  There were a finite number of these window devices, so once the system ran out of windows no more could be opened.

There was a definite assumption of “one window to one process” in SunView.  Your window was your only playground.  Looking Glass was different, it could open multiple windows.  Because of the finite number of windows available system-wide, on launch we created the alert that said “You can’t open any more windows because you’re out of windows”, thereby consuming a precious window resource, and hide it offscreen.  It was the only way we could reliably tell users why they couldn’t open any more windows.  Glad I wasn’t the one that had to make this work in the first place.

The other peculiarity is that you never got window events.  Even in the 1.0 version of the Macintosh toolbox you could figure out if the user dragged the window, or resized it, or changed its stacking order.  In SunView, you just got a signal. SIGWINCH, for Window Change, and hence the memory-lane trigger.  The user moved a window?  SIGWINCH.  The user resized it?  SIGWINCH.  The user changed the z-order?  SIGWINCH.

With just one window that’s not too bad.  Just query the window for its current size.  For us, though, we had to cache the window’s location, size, and stacking order.  Upon receipt of a SIGWINCH we’d have to walk all of our windows and compare it to the cached version, and see if anyone was resized, and then relayout the window views.

So, back to my bug.  It took me a solid month to fix.  Mainly because it was part time work in amongst my other responsibilities, and also because it was difficult to reproduce.  Spastic clicking and dragging could make it lock up, but not reliably.  Using the debugger was pointless – a 4 meg Sun 3/50 swapped eternally trying to get Looking Glass into gdb.  I ended up using a lot of caveman debugging.

Event queues

The architecture we used is shown in this diagram.  Each window had an event queue (remember that one window to one process assumption).  Upon receipt of events, we would walk our windows: read the events, handle them, then move on to the next window.

I was getting some printouts, though, showing an window receiving mouse-downs and mouse-drags, but no mouse-up.  Occasionally I would see a mouse-up, with no mouse-downs.  Ah-ha!  The mouse-up was being delivered to the wrong window’s event queue.  The fix was easy once I found it : just merge the events from all the windows first, and then process them.

It was then I learned how expensive malloc is.  I malloc’d and free’d event structures, but performance became dog-slow, especially during mouse drags.  Caching the structures made life fast again.

Memories like these make me so happy with the cool tech we get to play with these days.

 

(subsequently republished and edited quite heavily to the Miniblog)

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: