Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm a bit skeptical either example is representative of "most" existing software. If anything, the mere existence of __builtin_malloc and its default use should hint that most existing software doesn't care about malloc/free actually being called. That being said...

> As an example, user kentonv wrote: "I patched the memory allocator used by the Cloudflare Workers runtime to overwrite all memory with a static byte pattern on free". And compiler would, like, "nah, let's leave all that data on stack".

Strictly speaking, I don't think eliding malloc/free would "break" those programs because that behavior is there for security if/when something else goes wrong, not as part of the software's regular intended functionality (or at least I sure hope nothing relies on that behavior for proper functioning!).

> Or somebody would try to plug in mimalloc/jemalloc [] and wonder what's going on.

Why would mimalloc/jemalloc/some other general-purpose allocator care that it doesn't have to execute a matching malloc/free pair any more than the default allocator?

I'm not sure debug allocators would care either? If you're trying to debug mismatched malloc/free pairs then the ones the compiler elides are the ones you don't care about anyways since those are the ones that can be statically proven to be "self-contained" and/or correct. If you're gathering statistics then you probably care more about the malloc/free calls that do occur (i.e., the ones that can't be elided), not those that don't.

In any case, if you want to use a malloc/free implementation that promises more than the C standard does (e.g., special byte pattern on free, statistics/debug info tracking, etc.) there's always -fno-builtin-malloc (or memset_explicit if you're lucky enough to be using C23). Of course, the tradeoff is that you give up some potential performance.



Thank you for putting it in a much more correct and understandable language than I could. That is exactly what I am talking about: if you call __builtin_malloc (e.g. via macro definition in the libc header), compiler is free to do whatever it wants. However, calling "malloc" library function should call "malloc" library function, and anything else is unacceptable and a bug. There should be no case where compiler could assume anything about a function it does not see based simply on it's name. Neither malloc nor strlen.


> That is exactly what I am talking about: if you call __builtin_malloc (e.g. via macro definition in the libc header), compiler is free to do whatever it wants. However, calling "malloc" library function should call "malloc" library function, and anything else is unacceptable and a bug.

I think that's an overly narrow reading of the footnote. I don't see an obvious reason why "such names" in the footnote should only cover "some macro names beginning with an underscore" and not also "external identifiers". And if implementations are allowed to define special semantics for "external identifiers", then... well, that's exactly what they did!

In addition, there's still the as-if rule. The semantics of malloc/free are defined by the C standard; if the compiler can deduce that there is no observable difference between a version of the program that calls those and a version that does not, why does it matter that the call is emitted? A function call in and of itself is not a side effect, and since the C standard dictates what malloc/free do the compiler knows their possible side effects.

Furthermore, the addition of memset_explicit and its footnote ("The intention is that the memory store is always performed (i.e. never elided), regardless of optimizations. This is in contrast to calls to the memset function (7.26.6.1)") implies that eliding calls is in fact acceptable behavior when optimizations are enabled. If eliding calls were not permissible when optimizing then what's the point of memset_explicit?

> There should be no case where compiler could assume anything about a function it does not see based simply on it's name.

Again, external identifiers defined by the C standard are reserved. Reserved external identifiers aren't just for show. From the C89 standard:

> If the program defines an external identifier with the same name as a reserved external identifier, even in a semantically equivalent form, the behavior is undefined.

And from C23:

> If the program declares or defines an identifier in a context in which it is reserved (other than as allowed by 7.1.4), the behavior is undefined.

This means that yes, under modern compilers' interpretation of UB compilers can assume things about functions based on their names because modern compilers generally optimize assuming UB does not happen. The compiler does not need to see the function's implementation because it is the function's implementation as far as it is concerned.


Ah yes, N2625 "What we think we reserve". Basically any C program containing variable or function "top", "END", "strict", "member" and so on is non-conforming and subject to undefined behaviour, so they define "potentially reserved" identifiers and as usual compiler vendors go and do the sane right thing.


That paper isn't relevant here. From the paper (emphasis added):

> 7.1.3 Reserved Identifiers

> [snip]

> Macro names and identifiers with external linkage that are specified in the C standard library clauses.

> This proposal does not propose any changes to these reserved identifiers.

Furthermore, that paper doesn't make the use of reserved external identifiers not UB, so there's no change there either.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: