The response to the bug report looks depressingly typical. Rejects the working f...

stcredzero · on March 24, 2011

I know of one commercial Smalltalk UI bug that persisted 12 years -- being reported all the while. To be fair, it was a very tricky low-level race condition, very hard to reproduce, though very serious. (Unhandled exception in the bowels of the UI library. Boom! Application goes down.) Still, the attitude of the vendor was just unbelievable from the POV of the customer. After dozens of reports, hundreds of messages, numerous pieces of documentation, it still took 12 years for engineers to even start thinking it was something besides user error -- even though multiple customers were reporting it. (I know because I worked for 3 of them!) There is a huge perceptual wall there. I know because I used to work for the vendor. I know how apparent this bug is at a production shop and how opaque it appears from inside the vendor's camp. (And despite my being from inside, I still got the "user error" chant!)

EDIT: Oh, and I know of another UI bug that's been in their system for about 8 years. It's a Smalltalk newbie classic -- shoving non-identity keys into an IdentityDictionary. I could describe what it is to a Smalltalker in 2 sentences, and they could then find it and fix it. This vendor seems to have the same attitude about this bug, so I've already learned my lesson. They can keep their damn bug!

cletus · on March 24, 2011

1-2 years ago I had the exact same thing with a PHP bug (I know, PHP bugs... shocking!), specifically with mysqli. It would crash on LONGTEXT columns. Not reliably. Different people reported it in different forms over 2-3 years previous. all of them getting automated responses ("Please provide...") followed by ("Closed due to no activity for 7 days...") with the odd dismissive comment by a committer.

It's an incredibly frustrating experience.

sedachv · on March 24, 2011

This is why forks happen (I've forked two projects because of this, three if you count one I've deployed but haven't publicly released).

SwellJoe · on March 24, 2011

If you can fork it, you can include a patch in your bug report. Sometimes, that's all it takes to make things happen.

The most infuriating situations for me are when I submit a working patch, and it is ignored. This is, thankfully, very rare. In some cases, the patch leads to a better fix being written by the maintainer or someone else (an example of this for me was when I needed yum to support authenticated repositories; it didn't, so I patched it, posted the patch to the mailing list, and soon after one of the members of the team rewrote it to be more robust and have nicer configuration syntax within a week).

albertzeyer · on March 25, 2011

Sometimes it takes several weeks until the first respOnse to your patch and even longer to get it accepted (or rejected). But if you rely on that patch for your own stuff and you already know a few other things which need some work, your only real option is to fork.

acdha · on March 25, 2011

That's why I stopped using PHP outright as well - after several inarguable bugs were just punted on, it became apparent that the core developers did not care and were not seriously interested in outside help. Even bow in the post-DVCS world, nothing kills a project faster than someone realizing that they're going to have to fork the whole thing if they want it to work.

cube13 · on March 24, 2011

It's a working fix, but isn't the proper fix. The real issue is the size difference of the long type for 32 and 64-bit architectures. Bruce Evans doesn't explicitly mention this fact, but it's the core of his reasoning.

Since long is 64-bits on 64-bit architecture, and 32 on 32-bit architecture. This is the reason that 0xffffffff is showing up as a non-negative number on the 64-bit machine, but shows up as negative on the 32-bit.

Changing the type to int(which is 32 bits long on both x86 and x64), while it does break 16-bit systems(which don't exist anymore), fixes this completely by removing the x86 and x64 behavior differences with the long type.

The two style issues that he mentions are easily fixed by moving the variable declaration to the top of the function and initializing it there. However, these may be forced by the function structure due to the gotos present in the code...

peterbotond · on March 24, 2011

a better fix to use the lmin macro. libkern.h.

http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/src/sys/sys...

cube13 · on March 24, 2011

Not really. The line is this: long adv = min(recwin, (long)TCP_MAXWIN << tp->rcv_scale) - (tp->rcv_adv - tp->rcv_nxt);

So it's really: long = uint(long, (long)uint32) - (uint32-uint32)

Here's the problem on x64: you're converting a 64-bit long to a uint, then doing a subtraction with another uint, then placing that result into a 64-bit long. Since the compiler is just doing an assignment rather than a sign extension, which is why there is a large positive number rather than -1.

Changing it to use the lmin macro would make it: long = long(long, (long)uint32) - (uint32-uint32)

This still has the underlying issue(using 32-bit values in 64-bit buckets), which should work out fine, but may have issues down the road.

It makes more sense to change all the long types to int32/uint32 types rather than just cast longs everywhere. If recwin and adv were changed to int32, it would be: int32 = uint32(int32, uint32) - (uint32 - uint32)

While this potentially has issues if the uints are between 0x80000000 and 0xfffffff, it's a safer solution than using longs.

EDIT: added some explanation

peterbotond · on March 25, 2011

the lmin[1] macro will convert the uints to a long int and then do the arithmetic.

[1]static __inline long lmin(long a, long b) { return (a < b ? a : b); }

[2]static __inline u_int min(u_int a, u_int b) { return (a < b ? a : b); }

Edit: 6.3.1.8 Usualarithmetic conversions in the c-99 standard.

btmorex · on March 24, 2011

You have to do this if you want your codebase to get better over time rather than worse over time.

jongraehl · on March 24, 2011

What he's doing seems useful to the project. There's no better time to get it right. I'm just surprised he's willing to expend so much effort communicating instead of just fixing the patch.

I noticed that C programmers tend to use macros for things where (possibly non-exported) inline functions would make more sense. Why is that? Are they in the habit of building the OS with all optimizations off? Or is it that they're being used as poor man's generic function?

cube13 · on March 24, 2011

The inline keyword is best thought of as a hint to the compiler, not a command. The compiler is free to ignore the meatbag telling it to inline functions if it chooses to.

Macros are substituted in before the compiler, so they are always inlined.

EDIT: Hint, not suggestion.

Someone · on March 25, 2011

That is true, but the idea that programmers can use macros to force the compiler to emit optimal code is wrong, too. In the early days of C, that was (almost) true, but those days are over.

In theory, a compiler could uninline common code blocks, including macro calls, into functions to decrease object code size and/or working set size, thus speeding up the program (example: functions f and g with inlined function h each take 2 cache lines; without inlining, each of f, g and h fit a single cache line)

In practice, using an inline function will give the compiler the opportunity to weigh different objectives (code size, execution speed, debuggability, etc) against each other, and do the better thing.

wulczer · on March 24, 2011

You can't always reply on the compiler inlining your function, and AFAIK there's no portable way of forcing inlining.

abecedarius · on March 25, 2011

I think it's mostly inertia and culture. Inline functions weren't in the standard till C99.