Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Good for obscure leetcode problems (...)

God, I hate how pointless leetcode has become, and how it's used in interviews. Smug assholes using obscure data structures and algorithms trivia that no one ever used at all in any professional setting, and which the interviewer only learned about because he spent a week searching for a coding problem, to decide whether you are competent at putting a button on a window.



It's also about your communication skills and ability to use existing knowledge even when it isn't the most suitable. A while ago in an interview I was asked a problem for which the correct solution is in fact an order statistic tree, but not knowing that, I solved the problem using more standard data structures with worse complexity, and pointed out their limitation. I still passed that interview.


> that no one ever used at all in any professional setting, (…), to decide whether you are competent at putting a button on a window.

Some software jobs involve more than putting a button on a window.


Maybe they should keep interviewing for jobs that require nothing more complex, important, or dangerous than that.


I mean tbh, a list structure with logN indexed access, insert, and removal seems like a really useful structure.

I just wish it was built into languages as one implementation of the list type


> I mean tbh, a list structure with logN indexed access, insert, and removal seems like a really useful structure.

Indeed B-trees are hardly novel.

The point is that no one has a practical need to implement them themselves, unless you're working on rolling out your own database engine. If that's the case then either you're doing that for fun or there were steps in the decision process that I'd love to hear the justification.


> I'd love to hear the justification.

This is the right data structure to use for collaborative text editing. I made another comment in this thread going in to details. The fastest implementation I know of is Cola, and the author has some great posts going in to details if you’re interested:

https://nomad.foo/blog/cola


Sometimes, we get paid to hit the high notes, even if we don't sing them all the time, or even every day. See sibling josephg comment on use in CRDT implementation.


A lot of PLangs have this outside their stdlibs. For example, in Python: https://pypi.org/project/blist/

Some C & Nim variants are elsethread (https://news.ycombinator.com/item?id=40441832 & https://news.ycombinator.com/user?id=cb321 , respectively).

--

There seems to be a lot of disagreement about how big stdlibs should be. Personally, I think a large Python one was a selling point (batteries included) as well as the way C++ STL and similar Java ones won over many back in the 1990s. However, in today's modern networked-by-assumption world, stdlib maintainers seem to feel overwhelmed pretty easily and eager to "slough off" as much as they can to the ecosystem. It's likely a sub-discussion of an even more broadly contentious "how narrow/wide should dependency trees" be. What seems to be missing, IMO, are agreed upon "metrics" beyond "saleability of the PL/ecosystem".

As guidance for granularity metric relevance, it should be observed that most PL's have some kind of module/separate compilation units. A good answer would be able to work at (and across) many & various levels of the containment hierarchy - at least 3..4 of (procedure, type|module, package, developer, organization). Maybe someone has a set of such metrics in their back pocket, but if so they don't get a lot of exposure in PL chit-chat circles like this one. My guess would be there just isn't enough agreement, but it's probably been fodder for like 100s of software engineering PhD theses. Lol. ( As well as even broader human organization discussions like https://en.wikipedia.org/wiki/Conway's_law )


a relevant fact here is that python was, to a significant extent, guido's attempt to reimplement the abc teaching programming language without the design features that made it useless for anything but programming exercises; one of those, if i recall correctly, was that abc implemented lists as some kind of balanced tree structures (maybe avl trees?), in exactly the way this discussion is about, and was very slow as a result


Just to be clear, I never meant to suggest "only a b-tree", simply the option in the stdlib as per zug_zug (who I did not also take it as suggesting "only"). Different data structures for different use cases as always.

JavaScript famously implemented arrays in terms of hash tables and.. In a similar vein, I've also known people who also mistakenly thought B-trees could replace a good hash table as the default &| go-to "associative container", but "order costs" (on one side or another) as you said elsewhere.

It's hard to say how much choice needs to be in a stdlib. Hence my exploring how to better say it. :-) E.g., Go has no b-tree in the stdlib, but has a bunch of crypto stuff. Nim has little to no crypto stuff, but has a critbits trie.

EDIT: Oh, and FWIW, at least abc-unix used B-trees for list & tables, not AVL trees.


thank you for the correction about abc!

b-trees can sometimes replace a general hash table as the default associative array type. in part it depends on programming style and other parts of the language; in lisp, for example, the default associative array type was alists (linked lists of key-value pairs) for a long time, and emacs lisp still uses them pretty extensively. and i think you would be hard-pressed to find an sql implementation for which associative arrays are not usually b-trees—in part because in sql you tend to ask questions like 'which values correspond to these 1000 keys?' more often than 'which value corresponds to this key?', and in part for hysterical raisins. (when they wrote system r, oracle, and ingres, main memory was small, so locality of access was more important than it is now.)


FYI this is not that uncommon in functional languages and persistent data structure libraries.


And with predictably good cache locality on iteration. Given the ever increasing gap between the fast and the slow end off our volatile memory hierarchies, I'm surprised that we don't see more of an identifiable trend of using algorithms/structures traditionally associated with spinning disks in nonpersistent memory.


B+ Trees (which store data at leaves) rather than B-Trees are what you want for good iteration performance usually.

An ordered B-Tree iteration would jump between different arrays in the tree often, but a B+ Tree would only have you jump to another array once you’re done iterating over all the elements in the current array.

The following article says the same. https://www.scylladb.com/2021/11/23/the-taming-of-the-b-tree....


I mean, fwiw sometimes "obscure data structures" really do cleanly, efficiently, and performatively solve a problem that would have been a nightmare to address with simpler, more well known structures. That said, for most of us, it's rare enough to be memorable - but it does happen.

I think you're right that requiring such knowledge off the top of one's head and in the context of a one-hour interview is almost always unreasonable, though.

What I'd really want to know is how well a person can do the research to arrive at performant solutions involving, as you say, obscure data structures. In an interview context though, I think you basically just have to ask them - would be nice if you could get a sense of it through doing, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: