More

dzaima · 2026-03-26T11:59:45 1774526385

> It surprises me that the compiler doesn't still take the inference from the assert and just disable emitting the code to perform the check.

That's because that's what the <assert.h> assert() must do; it's specified to do and imply nothing when assertions are disabled. (the standard literally fully defines it as `#define assert(...) ((void)0)` when NDEBUG)

Whereas `[[assume(...)]]` is a thing specifically for that "infer things from this without actually emitting any code".

ralferoo · 2026-03-26T12:07:53 1774526873

Yeah, good point. Honestly it's been so long since I've added that to a project (it's normally hidden in some include that everything else includes) that I'd forgotten it wasn't a compiler level reserved keyword for C++ code.

dzaima · 2026-03-22T22:16:13 1774217773

In git if you, say, do some `git rebase -i`, edit some commit, continue the rebase, and hit a conflict, and realize you edited something wrong that caused the conflict, your only option is aborting the entire rebase and starting over and rebuilding all changes you did.

In jj, you just have a descending conflict, and if you edit the past to no longer conflict the conflict disappears; kinda as if you were always in interactive rebase but at all points have the knowledge of what future would look like if you `git rebase --continue`d.

Also really nice for reordering commits which can result in conflicts, but leaves descendants non-conflicting, allowing delaying resolving the conflicts after doing other stuff, or continuing doing some reordering instead of always starting from scratch as with `git rebase -i`.

dzaima · 2026-03-22T19:57:23 1774209443

But you do have the op log, giving you a full copy of the log (incl. the contents of the workspace) at every operation, so you can get out of such mistakes with some finagling.

You can choose to have a workflow where you're never directly editing any commit to "gain back autonomy" of the working copy; and if you really want to, with some scripting, you can even emulate a staging area with a specially-formatted commit below the working copy commit.

dzaima · 2026-03-19T23:55:43 1773964543

> as it's an app from a verified developer.

Well that's if they go through the verification process, which does not seem like a thing they'd want to do - https://f-droid.org/en/2026/02/24/open-letter-opposing-devel...

kuschku · 2026-03-20T00:26:31 1773966391

If one verified app can install many unverified apps, either aurora droid or fdroid basic or one of the many other frontends would end up offering that feature quickly.

But there's been some comments that even that wouldn't be possible, every app would have to be verified individually, or be signed by a developer with less than 20 installs.

(Which of course then begs the question: Why not build a version of Fdroid that generates its own signing key and resigns every app on device?)

dzaima · 2026-03-17T13:52:36 1773755556

As far as I know, the ARM (at least aarch64) situation should be about the same as x86-64. Anything specific that's bad about it? (there's aarch32 NEON with no subnormal support or whatever, but you can just not use it if determinism is the goal)

that RECIP14 link is AVX-512, i.e. not available on a bunch of hardware (incl. the newest Intel client CPUs), so you wouldn't ever use it in a deterministic-simulation multiplayer game anyway, even if you restrict yourself to x86-64-only; so you're still stuck to the basic IEEE-754 ops even on x86-64.

x86-64 is worse than aarch64 is a very important aspect - baseline x86-64 doesn't have fused multiply-add, whereas aarch64 does (granted, the x86-64 FMA extension came out around not far from aarch64/armv8, but it's still a concern, such is life). Of course you can choose to not use fma, but that's throwing perf away. (regardless you'll want -ffp-contract=off or equivalent to make sure compiler optimizations don't screw things up, so any such will need to be manual fma calls anyway)

kbolino · 2026-03-17T15:24:29 1773761069

The Steam hardware survey currently has FMA support at 97%, which is the same level as F16C, BMI1/2, and AVX2. Personally, I would consider all of these extensions to be baseline now; the amount of hardware not supporting them is too small to be worth worrying about anymore.

dzaima · 2026-03-13T12:06:02 1773403562

> This means that all medium price or high price smartphones that were introduced during the last 4 years have SVE2 support.

Except Qualcomm chipsets, which disable SVE even if all ARM cores used support it. ("Snapdragon 8 Elite Gen 5" supposedly finally supports SVE? but that's like only half a year old)

my123 · 2026-03-13T12:32:44 1773405164

Qualcomm was odd like that for a long time yeah.

And yes the Gen 5 chips (8, 8 Elite and X2) do implement SVE2 and SME.

dzaima · 2026-03-11T12:57:11 1773233831

There is Zmmul for multiplication-but-not-divide.

dzaima · 2026-03-11T08:57:41 1773219461

The option to generate or not generate misaligned loads/stores does exist (-mno-strict-align / -mstrict-align). But of course that's a compile-time option, and of course the preferred state would be to have use of them on by default, but RVA23 doesn't sufficiently guarantee/encourage them not being unreasonably-slow, leaving native misaligned loads/stores still effectively-unusable (and off by default on clang/gcc on -march=rva23u64).

aka, Zicclsm / RVA23 are entirely-useless as far as actually getting to make use of native misaligned loads/stores goes.

camel-cdr · 2026-03-11T11:58:34 1773230314

The cursed thing is that RVA23 does basically guarantees that `vle8.v` + `vmv.x.s` on misaligned addresses is fast.

dzaima · 2026-03-11T13:00:30 1773234030

Yeah, that is quite funky; and indeed gcc does that. Relatedly, super-annoying is that `vle64.v` & co could then also make use of that same hardware, but that's not guaranteed. (I suppose there could be awful hardware that does vle8.v via single-byte loads, which wouldn't translate to vle64.v?)

IshKebab · 2026-03-11T09:14:46 1773220486

> RVA23 doesn't guatantee them not being unreasonably-slow

Right but it doesn't guarantee that anything is unreasonably slow does it? I am free to make an RVA23 compliant CPU with a div instruction that takes 10k cycles. Does that mean LLVM won't output div? At some point you're left with either -mcpu=<specific cpu> and falling back to reasonable assumptions about the actual hardware landscape.

Do ARM or x86 make any guarantees about the performance of misaligned loads/stores? I couldn't find anything.

camel-cdr · 2026-03-11T12:02:37 1773230557

Exactly, I 100% agree, and IMO toolchains should default to assuming fast misaligned load/store for RISC-V.

However, the spec has the explicit note:

> Even though mandated, misaligned loads and stores might execute extremely slowly. Standard software distributions should assume their existence only for correctness, not for performance.

Which was a mistake. As you said any instruction could be arbitrarily slow, and in other aspects where performance recommendations could actually be useful RVI usually says "we can't mandate implementation".

dzaima · 2026-03-11T09:33:23 1773221603

I don't think x86/ARM particularly guarantee fastness, but at least they effectively encourage making use of them via their contributions to compilers that do. They also don't really need to given that they mostly control who can make hardware anyway. (at the very least, if general-purpose HW with horribly-slow misaligned loads/stores came out from them, people would laugh at it, and assume/hope that that's because of some silicon defect requiring chicken-bit-ing it off, instead of just not bothering to implement it)

Indeed one can make any instruction take basically-forever, but I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation (on at least some inputs), else having said instruction is strictly-redundant.

And if any significant general-purpose hardware actually did a 10k-cycle div around the time the respective compiler defaults were decided, I think there's a good chance that software would have defaulted to calling division through a function such that an implementation can be picked depending on the running hardware. (let's ignore whether 10k-cycle-division and general-purpose-hardware would ever go together... but misaligned-mem-ops+general-purpose-hardware definitely do)

IshKebab · 2026-03-11T10:22:34 1773224554

> if general-purpose HW with horribly-slow misaligned loads/stores came out from them

How is that different for RISC-V?

> I think it's a fairly reasonable expectation that all supported hardware instructions/behaviors (at least non-deprecated ones) are not slower than a software implementation

I agree! So just use misaligned loads if Zicclsm is supported. As you observed there's a feedback loop between what compilers output and what gets optimised in hardware. Since RVA23 hardware is basically non-existent at the moment you kind of have the opportunity to dictate to hardware "LLVM will use misaligned accesses on RVA23; if you make an RVA23 chip where this is horribly slow then people will laugh at you and assume it's some sort of silicon defect".

dzaima · 2026-03-11T10:50:15 1773226215

> How is that different for RISC-V?

RISC-V hardware with slow misaligned mem ops does exist to non-insignificant extent, and it seems not enough people have laughed at them, and instead compilers did just surrender and default to not using them.

> As you observed there's a feedback loop between what compilers output and what gets optimised in hardware.

Well, that loop needs to start somewhere, and it has already started, and started wrong. I suppose we'll see what happens with real RVA23 hardware; at the very least, even if it takes a decade for most hardware to support misaligned well, software could retroactively change its defaults while still remaining technically-RVA23-compatible, so I suppose that's good.

brucehoult · 2026-03-11T23:32:06 1773271926

> RISC-V hardware with slow misaligned mem ops does exist to non-insignificant extent

Only U74 and P550, old RV64GC CPUs.

SiFive's RVA23 cores have fast misaligned accesses, as do all THead and SpacemiT cores.

I can't imagine that all the Tenstorrent and Ventana and so forth people doing massively OoO 8-wide cores won't also have fast misaligned accesses.

As a previous poster said: if you're targeting RVA23 then just assume misaligned is fast and if someone one day makes one that isn't then sucks to be them.

dzaima · 2026-03-11T23:54:05 1773273245

P550 is, like, what, only a year old? I suppose there has been some laughing at it at least.

Also Kendryte K230 / C908, but only on vector mem ops, which adds a whole another mess onto this.

I'd hope all the massive OoO will have fast misaligned mem ops, anything else would immediately cause infinite pain for decades.

But of course there'll be plenty of RVA23 hardware that's much smaller eventually too, once it becomes a general expectation instead of "cool thing for the very-top-end to have".

I do agree that it'd be reasonable to just assume fast misaligned ops, but for whatever reason gcc and clang just don't, and that's what we have for defaults.

brucehoult · 2026-03-12T03:13:55 1773285235

> P550 is, like, what, only a year old?

No, it was released to customers in June 2021, almost five years ago.

https://www.sifive.com/press/sifive-performance-p550-core-se...

It has take a while for this core to appear in an SoC suitable for SBCs, as Intel was originally announced as doing that and got as far as showing a working SoC/Board at the Intel Innovation 2022 event in September 2022.

Someone who attended that event was able to download the source code for my primes benchmark and compile and run it, at the show, and was kind enough to send me the results. They were fine.

For reasons known only to Intel, they subsequently cancelled mass production of the chip.

ESWIN stepped up and made the EIC7700X, as used in the Milk-V Megrez and SiFive HiFive Premier P550, which did indeed ship just over a year ago.

But technically we could have had boards with the Intel chip three years ago.

Heck we should have had the far better/faster Milk-V Oasis with the P670 core (and 16 of them!) two years ago. Again, that was business/politics that prevented it, not technology.

dzaima · 2026-03-12T13:33:12 1773322392

> No, it was released to customers in June 2021, almost five years ago.

Ah, okay. (still, like, at least a couple decades newer than the last x86-64 chip with slow unaligned mem ops, if such ever existed at all? Haven't heard of / can't find anything saying any aarch64 ever had problems with them either, so still much worse for the RISC-V side).

Well, I suppose we can hope that business/politics messes will all never happen again and won't affect anything RVA23.

adgjlsfhk1 · 2026-03-12T03:13:42 1773285222

> I do agree that it'd be reasonable to just assume fast misaligned ops, but for whatever reason gcc and clang just don't, and that's what we have for defaults.

This very much has a "for now" on it. Once there is actually widespread hardware with the feature, I would be very surprised if the compilers don't update their heuristics (at least for RVA23 chips)

dzaima · 2026-03-12T13:33:52 1773322432

Indeed we shall hope heuristics update; but of course if no compilers emit it hardware has no reason to actually bother making fast misaligned ops, so it's primed for going wrong.

adgjlsfhk1 · 2026-03-12T20:31:23 1773347483

hardware devs traditionally have been pretty good at helping the compiler teams with things like this (because its a lot cheaper to improve the compiler than your chip).

newpavlov · 2026-03-11T10:47:13 1773226033

>So just use misaligned loads if Zicclsm is supported.

LLVM and GCC developers clearly disagree with you. In other words, re-iterating the previously raised point: Zicclsm is effectively useless and we have to wait decades for hypothetical Oilsm.

Most programmers will not know that the misaligned issue even exists, even less about options like -mno-strict-align. They just will compile their project with default settings and blame RISC-V for being slow.

RISC-V could've easily avoided all this mess by properly mandating misaligned pointer handling as part of the I extension.

dzaima · 2026-03-11T11:36:12 1773228972

Well, we don't necessarily have to wait for Oilsm; software that wants to could just choose to be opinionated and run massively-worse on suboptimal hardware. And, of course, once Oilsm hardware becomes the standard, it'd be fine to recompile RVA23-targeting software to it too.

> RISC-V could've easily avoided all this mess by properly mandating misaligned pointer handling as part of the I extension.

Rather hard to mandate performance by an open ISA. Especially considering that there could actually be scenarios where it may be necessary to chicken-bit it off; and of course the fact that there's already some questionability on ops crossing pages, where even ARM/x86 are very slow.

newpavlov · 2026-03-11T14:07:10 1773238030

I am not saying that RISC-V should mandate performance. If anything, we wouldn't had the problem with Zicclsm if they did not bother with the stupid performance note.

I would be fine with any of the following 3 approaches:

1) Mandate that store/loads do not support misaligned pointers and introduce separate misaligned instructions (good for correctness, so its my personal preference).

2) Mandate that store/loads always support misaligned pointers.

3) Mandate that store/loads do not support misaligned pointers unless Zicclsm/Oilsm/whatever is available.

If hardware wants to implement a slow handling of misaligned pointers for some reason, it's squarely responsibility of the hardware's vendor. And everyone would know whom to blame for poor performance on some workloads.

We are effectively going to end up with 3, but many years later and with a lot of additional unnecessary mess associated with it. Arguably, this issue should've been long sorted out in the age of ratification of the I extension.

dzaima · 2026-03-11T14:44:00 1773240240

2 is basically infeasible with RISC-V being intended for a wide range of use-cases. 1 might be ok but introduces a bunch of opcode space waste.

Indeed extremely sad that Zicclsm wasn't a thing in the spec, from the very start (never mind that even now it only lives in the profiles spec); going through the git history, seems that the text around misaligned handling optionality goes all the way back to the very start of the riscv/riscv-isa-manual repo, before `Z*` extensions existed at all.

More broadly, it's rather sad that there aren't similar extensions for other forms of optional behavior (thing that was recently brought up is RVV vsetvli with e.g. `e64,mf2`, useful for massive-VLEN>DLEN hardware).

newpavlov · 2026-03-11T15:28:00 1773242880

>1 might be ok but introduces a bunch of opcode space waste.

I wouldn't call it "waste". Moreover, it's fine for misaligned instructions to use a wider encoding or be less rich than their aligned counterparts. For example, they may not have the immediate offset or have a shorter one. One fun potential possibility is to encode the misaligned variant into aligned instructions using the immediate offset with all bits set to one, as a side effect it also would make the offset fully symmetric.

dzaima · 2026-03-11T17:34:49 1773250489

Of course that'd result in entirely-avoidable slowdown for the potentially-misaligned ops. Perhaps fine for a program that doesn't use them frequently, but quite bad for ones that need misaligned ops everywhere.

In terms of correctness, there's also the possibility of partially-misaligned ops (e.g. an 8B load with 4B alignment, loading two adjacent int32_t fields) so you're not handling everything with correct faults anyways.

dzaima · 2026-02-24T17:17:48 1771953468

The type seems to just be a small wrapper around a BigDecimal; the actual conversion arithmetic will presumably be relatively extremely slow regardless, a single extra allocation (in addition to BigDecimal's ≥3) won't change much.

rf15 · 2026-02-24T17:57:59 1771955879

fair point, I forgot what a hog BigDecimal is...

dzaima · 2026-02-19T16:41:06 1771519266

Speedrunning is very much modded - ranked (the big content) is just flat out modded (not just the match setup, there are game tweaks too (guaranteed blaze drops after 20 or so iirc, guaranteed dragon perch in ≤3 mins)), and even RSG/SSG/AA/etc have a long list of allowed mods (much quicker seed rerolling, timer, perf improvement mods, etc). Many(/most/all? idk) Many (/most/all? idk) hermits use mods (esp. freecam, replaymod for creating timelapses / pretty camera perspectives). Never mind shaders sprinkled in a portion of everything.

permo-w · 2026-02-19T17:38:46 1771522726

These are minor tweaks. You could remove these and the speedrunning community/HC would lose little. A second account in spectator mode is a slightly less convenient version of freecam and the speedrunning community is kidding themselves in the first place allowing any tweaks to RNG whatsoever. They could ban that tomorrow and there'd be some grumbling but nothing would change viewership-wise

dmonitor · 2026-02-19T19:23:57 1771529037

Minor tweaks are still a mod. Gameplay overhaul modpacks that turn the game into Factorio are definitely the a small minority of the playerbase, but anyone who knows better plays with at least some sort of client-side performance mod (Optifine, Lithium, etc), and that's been true since before 1.0.

Etho's dedication to keeping a purely vanilla singleplayer world is a unique feat. If you want to use Hermitcraft as an example of the median SMP, their modlist is actually quite large: https://github.com/henkelmax/hermitcraft-server

Minecraft simply has a lot of areas for improvement that haven't been touched by Mojang for one reason or another, and a big reason why people stick with Java is because the community has built an ecosystem to tweak the game to their liking.

permo-w · 2026-02-20T04:21:43 1771561303

I never said that modding isn't important, I said that it's not as important as people in this thread are making it out to be

dzaima · 2026-02-19T18:13:02 1771524782

The main actual speedrunning categories don't allow any RNG changes; but I doubt anyone doing RSG would have any interest whatsoever going back to the 20x-or-whatever slower seed rolling, that's just a completely utterly dumb waste of time doing literally nothing except clicking a button every 5 seconds (effectively changing the category from "who can play the game the best" to much more like "who has the most beast of a machine to run as many minecraft instances in parallel to more quickly roll a good seed"). Viewership would definitely go down from there being less actual gameplay.

Ranked is intended to be a fun competitive thing; waiting 10 minutes for a dragon perch doing nothing is Not Fun; waiting forever at a spawner is Not Fun; simple as that, people wouldn't play it if it wasn't fun. (oh, also, I believe Ranked also just generally includes making mob drops consistent for the same seed (and consistent portal locations, and probably other things), without which the whole entire concept of competitively playing the same seed would not work whatsoever, devolving to just who got the better RNG, distinctly Not Fun; also the ability to review a replay of your game afterwards for learning). Viewership and player counts would go down because you'd just be looking at very slow gambling instead of something actually meaningfully-skill-based.

A second account might work for freecam (though it adds more editing work, aka makes you not want to actually use it much), but making pretty timelapses is not feasible that way. Granted, you could still live without it, but the quality of content would undoubtedly go down. The little things go a long, long, long way.

dzaima · 2026-02-19T19:55:46 1771530946

To be clear I do kinda agree with the general idea that modding isn't that important to Minecraft Java; but it's still very important at least indirectly - were there not as large of a modding scene, I'd imagine many more content creators would've long ran out of content to make on it (or at least unique ways to do things), and the technical research/farms/whatever would be hampered by less available tooling.

(for what it's worth, last I played minecraft, like 1-2 years ago, I did so lightly-modded - Do A Barrel Roll for much more fun elytra; lithium; Distant Horizons; Hydrophobic Elytra to fix a stupid extremely-annoying elytra bug (might be fixed now?), BetterF3 (kinda superceded by the more recent F3 overhaul now I suppose?))