One of the most important reasons to reinvent the wheel which is is not mentioned by the author is to avoid adding complexity through unnecessary dependencies.
100% this, and I'll add that libraries become popular because they solve an issue in many different scenarios.
That menas that almost by definition, if a library is popular, it contains huge amounts of code that just isn't relevant to your use case.
The tradeoff should be whether you can code your version quickly (assuming it's not a crypto library, never roll your own crypto), because if you can, you'll be more familiar with it and carry a smaller dependency.
“Never roll your own crypto” is just this year’s “never roll your own date library”. There will always be something. Could I code this? Even if it’s quick, there’s ongoing maintenance cost and you lose out on the FOSS community identifying and fixing vulnerabilities as well as new features that you may need to use. Yes, the library might be large and contain things you don’t need, but that’s the tradeoff. You can mitigate this (depending on the platform and language)—for example, with ESM tree-shaking.
I’d rather install date-fns or moment and let it decide what the fourth Sunday of a given month is in 2046, and also audit for the latest browser attack vectors.
I also agree to avoid adding complexity through unnecessary dependencies.
> if a library is popular, it contains huge amounts of code that just isn't relevant to your use case.
It is true that many libraries do contain such code, whether or not they have dependencies. For example, SQLite does not have any dependencies but does have code that is not necessarily relevant to your use. However, some programs (including SQLite) have conditional compilation; that sometimes helps, but in many cases it is not suitable, since it is still the same program and conditional compilation does not change it into an entirely different one which is more suitable for your use.
Also, I find often that programs include some features that I do not want and exclude many others, and existing programs may be difficult to change to do it. So that might be another reason to write my own, too.
Unfortunately, if you depend on any libraries, there's a decent chance one of them depends on some support library. Possibly for just one function. And then your build tool downloads the entire Internet.
OP said that was for an entire app, not dependencies for a single gem. And it’s not really that many. A bone-stock Rails app includes almost 120 gems out of the box. Add a few additional gems that each have their own dependencies and you can get up to over 200 total packages pretty quick.
That depends a lot on your language / build system. The easier it is to add a dependency, the more likely that is to be how they work, broadly speaking.
It almost always means bloat though, because any library that’s not updated in the span of a year is considered “abandoned” and succumbs to feature creep.
"Never roll your own crypto" usually means "never devise your own crypto algorithms". Implementing an established algorithm yourself is OK provided you can prove your implementation works correctly. And... well, as Heartbleed showed, that's hard even with established crypto libraries.
Note that there are quite a few ways that crypto implementations can be insecure even if it's proven to be "correct" (in terms of inputs and outputs). For instance, it may leak information through timing, or by failing to clear sensitive memory due to a compiler optimization.
Frameworks are clearly worse, that's true. But there are also kitchen-sink libraries that are too shallow in relation to their API surface, or libraries that perform certain background work or modify some external global state in a way that is unnecessary for your use case, or libraries that pull in transitive dependencies of that sort. You really want to minimize the code that executes in order to fulfill your use case, and also minimize the temptation to depend on additional, tangential functions of a library that you wouldn’t have added for those functions alone.
Any fool can write an encryption algorithm that he himself can't break. The NSA would greatly prefer that you did, too. Security is an arms race - you have to counter the latest generation of attackers.
It's okay to write a compiler or a database if you only know the basic principles, but it's not okay to write an encryption algorithm, or even an implementation of one, using only basic principles, because someone who knows more than you will break it.
For instance, were you aware that every time you write array[index], it leaks data to other threads in other processes on the same CPU, including JavaScript code running inside web browsers?
Yes of course, but do you know exactly who wrote your encryption libraries and what their qualifications are and who they work for or what their conflicts of
interest might be?
I really doubt people give it even a second thought.
That holds not only for cryptography libraries, but generalizes to the entire computing stack. It's why, for example, coreboot exists, as well as various open source hardware projects. If it's fully open, you can inspect it yourself. Anywhere I see a branching statement within cryptography context, I'll know something's up.
The problems introduced in xz are still fresh, but Dual_EC_DRBG[0] also comes to mind within the cryptography context.
(Besides, getting cryptography right goes way beyond "just writing a library". As the parent commenter wrote, simple operations are the tip of the iceberg with regards to a correct implementation)
If you’re not aggressively vetting the crypto libraries you’re using, you’re more or less exposing yourself to the same probability of risk as rolling your own crypto.
An underrated middle ground, at least when it comes to open source, is vendoring the dependency, cutting out the stuff you don't need, and adapting the API so that it's not introducing more complexity than it has to.
This is also generally helpful when you have performance requirements, as often 3rd party code even when optimized in general, isn't very well optimized for any particular use case.
That's the main reason that I tend to "Reinvent the wheel."
Also, the dependencies often have a lot of extra "baggage," and I may only want a tiny bit of the functionality. Why should I use an 18-wheeler, when all I want to do, is drive to the corner store?
Also, and this is really all on me (it tends to be a bit of a minority stance, but it's mine), I tend to distrust opaque code.
If I do use a dependency, it's usually something that I could write, myself, if I wanted to devote the time, and something that I can audit, before integrating it.
I won't use opaque executables, unless I pay for it. If it's no-money-cost, I expect to be able to see the source.
Custom solutions while initially potentially less complex gradually grow in complexity. There might be a time when you it's worth it to throw out your custom solution and replace it with more general dependency. There's still a benefit because dependency introduced at this stage is used way more thoughtfully because you know the problem it solves inside out.
It might also change your psychological relationship with the dependecy. Instead of being disugsted by yet another external dependecy bringing poorly understood complexity into your project you are thankful that there exists a piece of code maintained and tested by someone else that does the thing you know you need done and lets you remove whole mess of complexity you yourself constructed.
Yeah, I built a library to run tasks based on a directed a-cyclical graph (DAG) and each task can optionally belong to a queue.
So I had to write a simple queue, but since I wanted demos to work in the browser it has a IndexedDB backend, and I wanted demos it to work in an Electron app, so there is a SQLite backend, and I’ll likely want a multi-user server based one so there is a Postgres backend.
And I wanted to use it for rate limiting, etc, so limiters were needed.
And then there is the graph stuff, and the task stuff.
There are a lot of wheels to-create actually, if you don’t want any dependencies.
I do have a branch that uses TypeBox to make and validate the input and output json schemas for the tasks, so may not be dependency free for the core eventually.
Not the grandparent, but Airflow is painfully slow and inefficient.
Our reinvented wheel using posgresql, rabbitmq and EC2 runners has ~10x better throughput and scales linearly with the number of pending tasks, whereas the airflow falls apart and fails to keep the runners fully occupied the moment you out any real load on it.
I'll agree with this, though in a lot of cases reinventing the wheel is a bad idea.
A previous coworker insisted om writing everything instead of using libraries so I had to maintains a crap undocumented buggy version of what was available in a library.