More

codys · 2025-07-20T20:40:16 1753044016

From the update added to the post:

> This is tracked through io_uring's completion queue - we only send a success response after receiving confirmation that the completion record has been persisted to stable storage.

Which completion queue event(s) are you examining here? I ask because the way this is worded makes it sound like you're waiting solely for the completion queue event for the _write_ to the "completion wal".

Doing that (waiting only on the "completion wal" write CQE)

1. doesn't ensure that the "intent wal" has been written (because it's a different io_uring and a different submission queue event used to do the "intent wal" write from the "completion wal" write), and

2. doesn't indicate the "intent wal" data or the "completion wal" data has made it to durable storage (one needs fsync for that, the completion queue events for writes don't make that promise. The CQE for an fsync opcode would indicate that data has made it to durable storage if the fsync has the right ordering wrt the writes and refers to the appropriate fd and data ranges. Alternatively, there are some flags that have the effect of implying an fsync following a write that could be used, but those aren't mentioned)

codys · 2025-07-20T16:58:14 1753030694

> > you lose the durability guarantee that makes databases useful. ... the data might still be sitting in kernel buffers, not yet written to stable storage.

> No! That's because you stopped using fsync. It's nothing to do with your code being async.

From that section, it sounds like OP was tossing data into the io_uring submition queue and calling it "done" at that point (ie: not waiting for the io_uring completion queue to have the completion indicated). So yes, fsync is needed, but they weren't even waiting for the kernel to start the write before indicating success.

I think to some extent things have been confused because io_uring has a completion concept, but OP also has a separate completion concept in their dual wal design (where the second WAL they call the "completion" WAL).

But I'm not sure if OP really took away the right understanding from their issues with ignoring io_uring completions, as they then create a 5 step procedure that adds one check for an io_uring completion, but still omits another.

> 1. Write intent record (async)

> 2. Perform operation in memory

> 3. Write completion record (async)

> 4. Wait for the completion record to be written to the WAL

> 5. Return success to client

Note the lack of waiting for the io_uring completion of the intent record (and yes, there's still not any reference to fsync or alternates, which is also wrong). There is no ordering guarantee between independent io_urings (OP states they're using separate io_uring instances for each WAL), and even in the same io_uring there is limited ordering around completions (IOSQE_IO_LINK exists, but doesn't allow traversing submission boundaries, so won't work here because OP submits the work a separate times. They'd need to use IOSQE_IO_DRAIN which seems like it would effectively serialize their writes. which is why It seems like OP would need to actually wait for completion of the intent write).

cryptonector · 2025-07-20T19:26:05 1753039565

Correct, TFA needs to wait for the completion of _all_ writes to the WAL, which is what `fsync()` was doing. Waiting only for the completion of the "completion record" does not ensure that the "intent record" made it to the WAL. In the event of a power failure it is entirely possible that the intent record did not make it but the completion record did, and then on recovery you'll have to panic.

codys · 2025-07-20T19:57:46 1753041466

Yes, but I suspect there might be some confusion by the author and others between "io_uring completion of a write" (ie: io_uring sends its completion queue event that corresponds to a previous submission queue event) and "fsync completion" (as you've put as "completion of all writes", though note that fsync the api is fd scoped and the io_uring operation for fsync has file range support).

The CQEs on a write indicate something different compared to the CQE of an fsync operation on that same range.

codys · 2025-07-20T15:26:05 1753025165

> But database engines are absolutely the target of io_uring's feature set and they're expected to be managing this complexity.

io_uring includes an fsync opcode (with range support). When folks talk about fsync generally here, they're not saying the io_uring is unusable, they're saying that they'd expect the fsync to be used whether it's via the io_uring opcode, the system call, or some other mechanism yet to be created.

codys · 2025-07-20T15:21:26 1753024886

> Yes, you do need to check if both records are written and then report it back to the client. But that is a non-fsync request and does not tax your system the same as fsync writes.

What mechanism can be used to check that the writes are complete if not fsync (or adjacent fdatasync)? What specific io_uring operation or system call?

codys · 2025-07-16T23:56:00 1752710160

They do not state that it is exclusively collected for those purposes, only that those purposes is included. As written, they'd be in line with their policy to collect that data for any purpose (including those listed).

lukas099 · 2025-07-17T01:19:01 1752715141

Yeah, I was thinking that too, but I’m not sure how the law works. They might only not say it’s only those reasons as a CYA. And I wouldn’t be surprised if other recording was otherwise illegal without explicit consent, especially for minors. So I’m not saying it isn’t recording everything, I’m just not sure that it it’s.

codys · 2025-06-25T18:50:37 1750877437

this seems possible to avoid as an issue without needing IP certs by having the configuration supply both an IP and a hostname, with the hostname used for the TLS validation.

move-on-by · 2025-06-25T20:31:25 1750883485

Yes, that is absolutely possible, but doesn't mean that will be the default. I commented recently [0] about Ubuntu's decision to have only NTS enabled (via domain) by default on 25.10. It begs the question how system time can be set if the initial time is outside of the cert's validity time-frame. I didn't look, but perhaps Chrony would still use the local network's published NTP servers.

[0]: https://news.ycombinator.com/context?id=44318784

codys · 2025-06-24T17:17:51 1750785471

Were there any Microsoft XP security issues caused by "Easter eggs" prior to that policy change? Or was this just put in place as a policy because it was easy to put in place?

Analemma_ · 2025-06-24T20:32:18 1750797138

I don't think there were any specific security issues caused by Easter eggs but the policy was announced as one of the many changes in their "Trustworthy Computing" initiative.

It seems kinda harsh but it's important to remember the context: at the time, the security situation in Windows and Office was dire and it was (probably correctly) perceived as an existential threat to the company. I think "no Easter eggs" was as much for optics as for its actual effect on the codebase, a way to signal "we know about and stand behind every line of code that gets written; nothing is unaccounted for".

codys · 2025-04-26T04:50:25 1745643025

I did something like the system described in this article a few years back. [1]

Instead of splitting the "configure" and "make" steps though, I chose to instead fold much of the "configure" step into the "make".

To clarify, this article describes a system where `./configure` runs a bunch of compilations in parallel, then `make` does stuff depending on those compilations.

If one is willing to restrict what the configure can detect/do to writing to header files (rather than affecting variables examined/used in a Makefile), then instead one can have `./configure` generate a `Makefile` (or in my case, a ninja file), and then have the "run the compiler to see what defines to set" and "run compiler to build the executable" can be run in a single `make` or `ninja` invocation.

The simple way here results in _almost_ the same behavior: all the "configure"-like stuff running and then all the "build" stuff running. But if one is a bit more careful/clever and doesn't depend on the entire "config.h" for every "<real source>.c" compilation, then one can start to interleave the work perceived as "configuration" with that seen as "build". (I did not get that fancy)

[1]: https://github.com/codyps/cninja/tree/master/config_h

tavianator · 2025-04-26T04:56:35 1745643395

Nice! I used to do something similar, don't remember exactly why I had to switch but the two step process did become necessary at some point.

Just from a quick peek at that repo, nowadays you can write

#if __has_attribute(cold)

and avoid the configure test entirely. Probably wasn't a thing 10 years ago though :)

o11c · 2025-04-26T05:46:42 1745646402

The problem is that the various `__has_foo` aren't actually reliable in practice - they don't tell you if the attribute, builtin, include, etc. actually works the way it's supposed to without bugs, or if it includes a particular feature (accepts a new optional argument, or allows new values for an existing argument, etc.).

aaronmdjones · 2025-04-26T11:07:48 1745665668

    #if __has_attribute(cold)

You should use double underscores on attribute names to avoid conflicts with macros (user-defined macros beginning with double underscores are forbidden, as identifiers beginning with double underscores are reserved).

    #if __has_attribute(__cold__)
    #  warning "This works too"
    #endif

    static void __attribute__((__cold__))
    foo(void)
    {
        // This works too
    }

codys · 2025-04-26T05:04:24 1745643864

yep. C's really come a long way with the special operators for checking if attributes exist, if builtins exist, if headers exist, etc.

Covers a very large part of what is needed, making fewer and fewer things need to end up in configure scripts. I think most of what's left is checking for items (types, functions) existence and their shape, as you were doing :). I can dream about getting a nice special operator to check for fields/functions, would let us remove even more from configure time, but I suspect we won't because that requires type resolution and none of the existing special operators do that.

mikepurvis · 2025-04-26T05:45:31 1745646331

You still need a configure step for the "where are my deps" part of it, though both autotools and CMake would be way faster if all they were doing was finding, and not any testing.

codys · 2025-04-27T00:29:27 1745713767

True. That isn't something the compiler can do.

That said, that (determining the c flags and ld flags for dependencies) is something that might be able to be mixed into compilation a bit more than it is now. Could imagine that if we annotate which compilation units need a particular system library, we could start building code that doesn't depend on that library while determining the library location/flags (ie: running pkg-config or doing other funny business) at the same time.

Or since we're in the connected era, perhaps we're downloading the library we require if it's not found and building it as an embedded component.

With that type of design, it becomes more clear why moving as much to the build stage (where we can better parallelize work because most of the work is in that stage) and more accurately describing dependencies (so we don't block work that could run sooner) can be helpful in builds.

Doing that type of thing requires a build system that is more flexible though: we really would need to have the pieces of "work" run by the build system be able to add additional work that is scheduled by the build system dynamically. I'm not sure there are many build systems that support this.

mikepurvis · 2025-04-28T04:16:24 1745813784

Download/build on demand is cute when it works, but it's a security nightmare and a problem for Nix which runs the build in an environment that's cut off from the network.

This is already a problem for getting Bazel builds to run nicely under Nix, with the current solution (predownload everything into a single giant "deps" archive in the store and then treat that as a fixed input derivation with a known hash value) is deeply non-optimal. Basically, I hope that any such schemes have a well-tested fallback path for bubbling the "thing I would download" information outward in case there are reasons to want to separate those steps.

codys · 2025-04-28T23:41:45 1745883705

I agree that there are problems when laying multiple build systems on top of one another, and I see that often as a user of nix (it's also bad with rust projects that use cargo, and though there are a variety of options folks have written they all have tradeoffs).

To some extent, the issue here is caused by just what I was discussing above: Nix derivations can't dynamically add additional derivations (ie: build steps not being able to dynamically add additional build steps makes things non-optimal).

I am hopeful that Nix's work on dynamic derivations will improve the situation for nix (with respect to bazel, cargo, and others) over time, and I am hopeful that other build systems will recognize how useful dynamically adding build steps can be.

mikepurvis · 2025-04-29T15:28:25 1745940505

It's true— fundamentally, nothing about a build realizing partway through that it needs more stuff breaks the Nix philosophy, assuming the build is holding a hash for whatever it is it wants to pull so that everything stays hermetic. It's a bit annoying to not know upfront exactly what your build graph looks like but honestly it's not the worst— like, you already don't know how long each derivation is going to take.

In any case, the tvix devs have definitely understood the assignment on this and are making not only ifd a first class citizen, but also the larger issue of allowing the evaluation step to decompose, and for the decomposed pieces to run in parallel with each other and with builds— and that really is the game-changer, particularly with a cluster-backed build, to be able to start work immediately rather than waiting out a 30-60 second single-threaded eval.

throwaway81523 · 2025-04-26T05:48:15 1745646495

GNU Parallel seems like another convenient approach.

fmajid · 2025-04-26T11:11:55 1745665915

It has no concept of dependencies between tasks, or doing a topological sort prior to running the task queue. GNU Make's parallel mode (-j) has that.

codys · 2025-02-16T02:47:39 1739674059

On audio sync issues, I unfortunately see them with jellyfin rather frequently on apple tv when using homepods as audio output. I end up having to enable the "native player" in the experimental settings to get the audio in-sync.

I've previously reported this to the developers of the app, and they've closed the issue saying it was a bug in one of their dependencies, without fixing the issue. It remains unfixed.

codys · 2025-02-08T00:43:21 1738975401

This is a fundamental misunderstanding of the structure of the linux kernel, the nature of kernels in general, and the ways one performs automated verification of computer code.

Automated verification (including as done by rust in it's compiler) does not involve anything popularly known as AI, and automated verification as it exists today is more complete for rust than for any other system (because no other widely used language today places the information needed for verification into the language itself, which results rust code being widely analyzable for safety).

Human validation is insufficient and error prone, which is why automated verification of code is something developers have been seeking and working on for a long time (before rust, even).

Having "explicit" (manual?) memory management is not a benefit to enabling verification either by humans or by machines. Neither is using a low level language which does not specify enough detail in the type system to perform verification.

Kernel modules aren't that special. One can put a _lot_ of code in them, that can do effectively anything (other than early boot bits, because they aren't loaded yet). Kernel modules exist for distribution reasons, and do not define any strict boundary.

If we're talking about out-of-tree kernel modules, those are not something the tend to exist for a long time. The only real examples today of long lived out-of-tree modules are zfs (filesystem) and nvidia (gpu driver). These only exist out-of-tree because of licensing and secrecy. This is because getting code in-tree generally helps keep code up to date with less effort from everyone involved: the people already making in-tree changes can see how certain APIs are being used and if those in-tree folks are more familiar with the API they can/may improve the now-merged code. And the formerly out-of-tree folks don't have to run their own release process, don't have to deal with constant compatibility issues as kernel APIs change, etc.

ActorNightly · 2025-02-09T05:54:06 1739080446

>Human validation is insufficient and error prone,

Basically, if you assume thats impossible for humans to be correct, or that its impossible to write correct memory safe C code, you start down the path that leads to things like Java, Haskell, and now Rust. And then when nobody takes you seriously, you wonder why - well, its because you are telling people who know how to write correct and memory safe C code that we are insufficient and error prone

>Kernel modules aren't that special.

By definition, they interface with the core kernel code. They are not core kernel code