C and C++ Prioritize Performance over Correctness

pnutzh4x0r@lemmy.ndlug.org · 1 year ago

C and C++ Prioritize Performance over Correctness

qwertyasdef@programming.dev · 1 year ago

The behavior is defined; the behavior is whatever the processor does when you read memory from address 0.

If that were true, there would be no problem. Unfortunately, what actually happens is that compilers use the undefined behavior as an excuse to mangle your program far beyond what mere variation in processor behavior could cause, in the name of optimization. In the kernel bug, the issue wasn’t that the null pointer dereference was undefined per se, the real issue was that the subsequent null check got optimized out because of the previous undefined behavior.

mo_ztt ✅@lemmy.world · 1 year ago

Well… I partially agree with you. The final step in the failure-chain was the optimizer assuming that dereferencing NULL would have blown up the program, but (1) that honestly seems like a pretty defensible choice, since it’s accurate 99.999% of the time (2) that’s nothing to do with the language design. It’s just an optimizer bug. It’s in that same category as C code that’s mucks around with its own stack, or single-threaded code that has to have stuff marked volatile because of crazy pointer interactions; you just find complex problems sometimes when your language starts getting too close to machine code.

I guess where I disagree is that I don’t think a NULL pointer dereference is undefined. In the spec, it is. In a running program, I think it’s fair to say it should dereference 0. Like e.g. I think it’s safe for an implementation of assert() to do that to abort the program, and I would be unhappy if a compiler maker said “well the behavior’s undefined, so it’s okay if the program just keeps going even though you dereferenced NULL to abort it.”

The broader assertion that C is a badly-designed language because it has these important things undefined, I would disagree with; I think there needs to be a category of “not nailed down in the spec because it’s machine-dependent,” and any effort to make those things defined machine-independently would mean C wouldn’t fulfill the role it’s supposed to fulfill as a language.

Sonotsugipaa@lemmy.dbzer0.com · 1 year ago

I’m not sure about C, but C++ often describes certain things as “implementation defined”;
one such example is casting a pointer to a sufficiently big integral type.

For example, you can assign a float* p0 to a size_t i, then i to a float* p1 and expect that p0 == p1.
Here the compiler is free to choose how to calculate i, but other than that the compiler’s behavior is predictable.

“Undefined behavior” is not “machine-dependent” code - it’s “these are seemingly fine instructions that do not make sense when put together, so we’re allowing the compiler to assume that you’re not going to do this” code.

That said, UB is typically the result of “clever” programming that ignores best practices, aside from extreme cases prosecuted by the fiercest language lawyers (like empty while(true) loops that may or may not boot Skynet, or that one time that atan2(0,0) erased from this universe all traces of Half Life 3).

mo_ztt ✅@lemmy.world · edit-2 1 year ago

For example, you can assign a float* p0 to a size_t i, then i to a float* p1 and expect that p0 == p1. Here the compiler is free to choose how to calculate i, but other than that the compiler’s behavior is predictable.

I don’t think this specific example is true, but I get the broader point. Actually, “implementation defined” is maybe a better term for this class of “undefined in the language spec but still reliable” behavior, yes.

“Undefined behavior” is not “machine-dependent” code

In C, that’s exactly what it is (or rather, there is some undefined-in-the-spec behavior which is machine dependent). I feel like I keep just repeating myself – dereferencing 0 is one of those things, overflowing an int is one of those things. It can’t be in the C language spec because it’s machine-dependent, but it’s also not “undefined” in the sense you’re talking about (“clever” programming by relying on something outside the spec that’s not really official or formally reliable.) The behavior you get is defined, in the manual for your OS or processor, and perfectly consistent and reliable.

I’m taking the linked author at his word that these things are termed as “undefined” in the language spec. If what you’re saying is that they should be called “implementation defined” and “undefined” should mean something else, that makes 100% sense to me and I can get behind it.

The linked author seems to think that because those things exist (whatever we call them), C is flawed. I’m not sure what solution he would propose other than doing away with the whole concept of code that compiles down close to the bare metal… in which case what kernel does he want to switch to for his personal machine?

C and C++ Prioritize Performance over Correctness

C and C++ Prioritize Performance over Correctness

research!rsc: C and C++ Prioritize Performance over Correctness