Hacker Newsnew | past | comments | ask | show | jobs | submit | charleslmunger's commentslogin

I've spent some brainpower on binary search and have not been able to beat this:

https://github.com/protocolbuffers/protobuf/blob/44025909eb7...

1. Check for dense list O(1) 2. Check upper bound 3. Constant trip count binary search

The constant trip count is great for the branch predictor, and the core loop is pretty tightly optimized for the target hardware, avoiding multiplies. Every attempt to get more clever made the loop worse and did not pay for itself. It's hard because it's an array-of-structs format with a size of 12, and mostly pretty small N.


For a pretty small N I've found that less clever can be quite a bit faster. I'd try a linear search - possibly SIMD if you can change the data format to struct-of-arrays. An adaptive approach that uses linear search up to a certain N can also yield some benefit.

The first implementation I encountered was a linear search, starting at the last-found field. Empirically it performed better to do a binary search with early exit and branchless bounds selection, I think due to branch predictor pressure. The data representation could be changed but it's tricky, as there are other traversals that want to go in sorted order, and there are lots of places that pass just one pointer for fields. But I agree any further improvement will probably have to come from that.

SIMD is tricky even with SoA because there is significant latency going between the general registers and the vector units, plus arm little cores can be configured to share a vector unit with another core.


> SIMD is tricky even with SoA because there is significant latency going between the general registers and the vector units

My experience is mostly limited to AMD64, but libraries like glibc use SIMD in many places for faster linear search. Presumably they've done testing and found it worth while.


Yeah arm little cores are a very different story - they aren't superscalar out of order architectures, they can dispatch up to two operations per cycle.

Big cores are more like that dispatching 8 or more operations per cycle, but they're also more expensive, larger, etc.


If you control the layout, eytzinger layout typically will give you the best of both worlds. As fast as a linear scan for small N, much faster than binary search over a sorted array for large N.

I know protobuf code is extremely high quality, but I really can't stand the c-style naming conventions.

I know people train themselves into grokking this and reading and emitting this way, but it sounds like writing "bork bork bork bork" runes to me.

I'm glad Rust feels more like Ruby and Python and that method and field names are legible.

My eyes just glaze over:

    UPB_API_INLINE
    const struct upb_MiniTableField* upb_MiniTable_FindFieldByNumber(
        const struct upb_MiniTable* m, uint32_t number) {
      const uint32_t i = number - 1;  // 0 wraps to UINT32_MAX
    
      // Ideal case: index into dense fields
      if (i < m->UPB_PRIVATE(dense_below)) {
        UPB_ASSERT(m->UPB_ONLYBITS(fields)[i].UPB_ONLYBITS(number) == number);
        return &m->UPB_ONLYBITS(fields)[i];
      }
    
      // Early exit if the field number is out of range.
      uint32_t hi = m->UPB_ONLYBITS(field_count);
      uint32_t lo = m->UPB_PRIVATE(dense_below);
      UPB_ASSERT(hi >= lo);
      uint32_t search_len = hi - lo;
      if (search_len == 0 ||
          number > m->UPB_ONLYBITS(fields)[hi - 1].UPB_ONLYBITS(number)) {
        return NULL;
      }
    
      // Slow case: binary search
      const struct upb_MiniTableField* candidate;
    #ifndef NDEBUG
      candidate = UPB_PRIVATE(upb_MiniTable_ArmOptimizedLowerBound)(
          m, lo, search_len, number);
      UPB_ASSERT(candidate ==
                 UPB_PRIVATE(upb_MiniTable_LowerBound)(m, lo, search_len, number));
    #elif UPB_ARM64_ASM
      candidate = UPB_PRIVATE(upb_MiniTable_ArmOptimizedLowerBound)(
          m, lo, search_len, number);
    #else
      candidate = UPB_PRIVATE(upb_MiniTable_LowerBound)(m, lo, search_len, number);
    #endif
    
      return candidate->UPB_ONLYBITS(number) == number ? candidate : NULL;
    }

> I really can't stand the c-style naming conventions.

Honestly I don't see much difference between

  upb_MiniTable_FindFieldByNumber
and

  upb::MiniTable::FindFieldByNumber

Those are fairly indistinguishable. It's when they start removing letters from words to save... debug symbol bytes or something? That's when c-style naming annoys me.

In their defence "hi" sounds very much like "high" in my mind's ear and "lo" like "low" :)

This is also my pet peeve with a lot of code as well as commands like

    npm -g i package-name 
Like why would you teach people to do this? I understand people needed to save precious bytes in the sixties so we have cat and ls but saving 192 bytes or whatever with shorter variable names is not a worthwhile tradeoff anymore.

What exactly bothers you about this and what would you prefer to see?

I would prefer to see full names like

    npm install --global @scope/PackageName 
At least the hi and lo can get more meaningful names. And over time we can write this in another language with private/scoped methods.

Yeah namespaces and public/private would be quite nice, but C doesn't have them, so they get hacked on via macros and prefixing. The syntax was not the hard part of working or analyzing this code, though.

I think this needs way more "upb" and "UPB" to make it clear that it is, in fact, dealing with UPBs. Whatever these are.

> μpb (often written 'upb') is a small protobuf implementation written in C.

I had fun exploiting this to detect the falling convention used by some code at runtime - there were two different options depending on OS version; one passed a jnienv* as the first param, the other did not. So if I called it with 0, I could tell which was being used based on whether the first argument was NULL or not. Only used for specific architectures with a defined ABI that behaved this way, of course.

>Maintaining both MV2 & MV3 support isn’t easily sustainable long term when you factor in the need to prioritize other features.

There is no feature Firefox provides that is more differentiating than ublock origin. As long as pages load and security issues are patched it is the reason to choose Firefox as a browser. What would they prioritize over it?


And there's nothing in MV2 that uBlock Origin needs that doesn't exist in MV3 on Firefox, unlike Chrome. This issue is completely overblown.


Are you disputing uBlock Origin's list of MV3-incompatible capabilities [1]?

[1] https://github.com/uBlockOrigin/uBOL-home/wiki/Frequently-as...


That list contains issues with the APIs that Chrome exposes via MV3. Firefox still supports APIs that Chrome removed.


Not really, but this FAQ, like almost all articles published on MV3, conflates MV3 the specification with Chrome's MV3 implementation. (FWIW, I'm almost certain that this is either due to sloppy/imprecise writing or intentional, with the authors not wanting to confuse users already appropriately riled up by equally imprecise reporting on MV3. They definitely know the difference.)

In any case, for better or worse, when people say MV3, they now usually mean "Chrome's MV3 implementation", which obviously never applies to Firefox.


That's utter bullshit. The author of uBlock Origin has posted a long list of capabilities that declarativeNetRequest does not support.


Unlike Chrome, Firefox did not remove the older API.


What's this supposed to mean ? OP was saying that MV3 is feature-equivalent to MV2 and would like to see MV2 support removed from Firefox just as it was from Chrome. I replied pointing out that's utterly false.


MV2 and MV3 are feature equivalent on Firefox when it comes to request blocking.


I’d like to see more investment in their new profile manager. It feels pretty barebones at the moment. Arc had the ability to link profiles to “spaces” and you could easily switch between them without opening a new window. It was very nice to so easily swap between personal, work, & side business.


The multi user containers are also very nice.


And to go one step further, for achieving a profile-per-firefox-window workflow, I suggest to have a look at the underrated extension Sticky Window Containers [0]

While far from being perfect, I find it good enough for keeping things separated, especially when using a desktop/workspace workflow. For example, in workspace/desktop 2 I have a Firefox window opened with the first tab set to "container A", so hitting ctrl-t there opens new tabs with the same container "A", so I'm logged-in for all projects A. In another Firefox window in workspace 3 I work with "business project B" tabs (where I'm logged into different atlassian, github, cloud, gmail, ...)

Then with a Window Manager like i3wm or Sway I set keybinds to jump directly to the window (and workspace), using the mark feature [1]

It's also possible to open websites directly in specific containers so it's flexible. For example on my desktop 8 I have all my AI webchats in "wherever my company pay for it" tabs: `firefox --new-window 'ext+container:name=loggedInPersonnal&url=https://chat.mistral.ai' 'ext+container:name=loggedInBusinessA&url=https://chatgpt.com' 'ext+container:name=loggedInBusinessB&url=https://gemini.google.com' 'ext+container:name=loggedInBusinessB&url=https://claude.ai'`

It's also the only way I found to keep opened multiple chat apps (Teams, Slack, Discord, ...). The alternative electron apps are as resource-hungry, and in my experience never handled multiple accounts well (especially Teams).

[O] https://addons.mozilla.org/en-US/firefox/addon/sticky-window...

[1] https://i3wm.org/docs/userguide.html#vim_like_marks


If you can trigger address sanitizer from input outside the program, and the program may interact with untrusted input, isn't that always worth reporting and fixing?


>With Kotlin/Spring Boot, compilation is annoyingly slow. That's what you get with modern languages and rich syntax.

This is because the kotlin compiler is not written in the way people write fast compilers. It has almost no backend to speak of (if you are targeting the jvm), and yet it can be slower at compilation than gcc and clang when optimizing.

Modern fast compilers follow a sort of emerging pattern where AST nodes are identified by integers, and stored in a standard traversal order in a flat array. This makes extremely efficient use of cache when performing repeated operations on the AST. The Carbon, Zig, and Jai compiler frontends are all written this way. The kotlin compiler is written in a more object oriented and functional style that involves a lot more pointer chasing and far less data-dense structures.

Then, if run on a non-graal environment, you also have to pay for the interpreter and JIT warmup, which for short-lived tasks represents nontrivial overhead.

But unlike header inclusion or templates, which are language level features that have major effects on compilation time, I don't think kotlin the language is inherently slow to compile.


The default Gboard keyboard has settings for always showing the number row, or only showing it when entering a password. There is also a setting for the "suggestion strip" under "corrections & suggestions". You can also drag to resize the keyboard itself in the Gboard menu, to scale the height.

Now, whether your users will do that to play your game is a different story, but the options exist.


Not OP, but used Arch for a while in 2011, and at some point doing an update moved /bin to /usr/bin or something like that and gave me an unbootable system. This was massive inconvenience and it took me many hours to un-hose that system, and I switched to Ubuntu. The Ubuntu became terrible with snaps and other user hostile software, so I switched to PopOS, then I got frustrated with out of date software and Cosmic being incomplete, and am now on Arch with KDE.

Back then I used Arch because I thought it would be cool and it's what Linux wizards use. Now Arch has gotten older, I've gotten older, and now I'm using Arch again because I've become (more of a) Linux wizard.


The silly move from /bin to /usr/bin broke lots of distros. Probably would have worked out if they'd had cp --reflink=auto --update to help ease migrations from files in /bin to /usr/bin and then just symlinked /bin to /usr/bin . However then any setups where /usr is a distinct filesystem from / would hard-require initramfs to set that up before handoff.

The python (is python2) transition was even more silly though. Breaking changes to the API and they wanted (and did!) force re-pointing the command to python3? That's still actively breaking stuff today in places that are forsaken enough to have to support python2 legacy systems.


Arch + KDE is pretty sweet. It looks gorgeous out of the box, and gives you a system that mostly just works but is still everything you love about Arch


Also not OP, I gave up Arch around 2011 as well after I wasn't able to mount a USB pendrive at the uni, as I was rushing somewhere. This was embarrassing and actually a serious issue, took some time to fix upstream and finding workaround was also annoying. This is when I gave up on it and never looked back, but I did, indeed, learn all about Linux internals from dailying Arch for 3 or so years.


You have a list of IDs, and want to make them compact for storage or transport - fast and simple way is to sort and delta encode.


Hmm. That’s fair, though I’d probably use set operations instead. What you find though is that for most other problems besides diffing, ID order is not chronological order, so you need to sort by a date stamp instead. But I’m typically letting the database do that, so I’m a consumer of sorted numbers, but not an implementor. Because what I sort is nearly always compound sorts. By field A, then field B and field C if those two still don’t cut it.


C99, but with a million macros backporting features from newer language versions and compiler extensions. Lovely features you don't get with ordinary c99:

free_sized

#embed

static_assert

Types for enum

Alignof, alignas, aligned_alloc

_Atomic


What hardware are you running on where the cost of a relaxed 64 bit load and a branch is significant compared to a (possibly contended) cas?

You could always use ldset on arm for this.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: