I wonder why not reimplement coreutils as library functions, to be used within an ad-hoc REPL. It would be much more flexible and extensible.
AFAIK, originally the reason why they were made as programs was that the available programming languages were too cumbersome for such use. Now we have plenty of experience making lightweight scripting languages that are pleasant to use in a live environment (vs premade script), so why give up the flexibility of ad-hoc scripting?
what if it were written as a library, with the traditional cli implementation as a thin layer over it?
I'm thinking about the wider FOSS ecosystem, for example if Firefox was built as a gui gluing together a modular collection of libraries people could do all sort of cool things with them.
Monolithic applications make sense for proprietary software, not so much for FOSS.
You're proposing abandoning the Unix/Posix standards and philosophy in favor of an untested strategy.
"Move fast and break things" makes sense for acquiring investors, not so much for core infrastructure that everyone everywhere depends on.
This could work, but you'd need at least a decade of widespread usage to work out all the problems before it would even be worth considering for core infrastructure tools.
Why would that be relevant if the thin wrapper is fully compliant with the POSIX/GNU standards?
Busybox with it's single binary, depend-on-arg[0]-hack implantation was at one point an "untested strategy", yet look at it now. Rust's Coreutils need to offer a real uvp if they want to see real adoption. Providing coreutil as a library and not forking processes would certainly qualify as that.
If stability is a concern, exposing a greater surface for user interaction would surely slow down development as these interfaces would have to be reworked with care.
If stability is not a concern, then any user tools built upon these interfaces would be subject to breakage at the rate of upstream development. That’s got to be frustrating.
One of the nice things about the POSIX command‐line interface is that the build systems that interact with them know what to expect, because the interface has been much the same for a very long time, while still providing hugely useful capability.
As stable as, say, golang's standard library.
Sure, it needs upfront thinking and commitment, but it's not that difficult and might be well worth it.
In the case of coreutils, the problem space is fairly simple and well-understood, so it should be quite easy to commit to a stable interface. Even for something exceptionally complex like a web browser, I'd expect most components to be easily kept backwards-compatible in terms of public api.
> As stable as, say, golang's standard library. Sure, it needs upfront thinking and commitment, but it's not that difficult and might be well worth it.
That's actually far less stable than is needed for core utils.
> what if it were written as a library, with the traditional cli implementation as a thin layer over it?
That's kind of the way it is. Most of the core utils are thin wrappers around C libraries.
- - - -
It sounds like you're thinking of things like the Oberon OS, where there were no separate applications, instead the system was extended by adding new commands to a unitary GUI. Or the Canon Cat.
nope. it's still a traditional binary meant to be used as traditional binaries from the posix shell. What I mean is, replace both the binaries and the shell with a library equivalent of coreutils running from a REPL.
Sometimes, hitting those roadblocks leads to a better solution.
Maybe the new model is slower, and somebody looks into it, and realizes if they add a caching layer between the "REPL" module and the kernel ioctl, or service orwhatever, it will speed things up.
I run find and grep lot. And I'm sure the kernel caches a lot of the FS stuff, but there are higher-level things that could be cached and shared with other "REPL" modules. Like predictive URL middleware in browsers. Pluggable middleware that can be enabled or disabled.
Available now on the OS module store:
Larry's Grep Count Document Prefetch Module. Certified Safe by BlahCorp.
This isn't a new idea, and I'm sure others have had it before me.
it's not just about forking processes. Instead than a single binary that needs to satisfy as much use cases as possible while remaining small and general, you would have a lot of more atomic functions that users can mix and swap as needed case-by-case.
> Instead than a single binary that needs to satisfy as much use cases as possible while remaining small and general, you would have a lot of more atomic functions that users can mix and swap as needed case-by-case.
Maybe I'm missing something here (it's been a long time since I last looked at the busybox code), but isn't busybox a single file that has a lot of atomic functions that callers can mix and swap as needed, using the shell as a REPL?
IIRC, and please correct me if I am wrong), all those little functions in busybox are simply single functions. There's a `cat` function, and a `head` function, and a `cp` function, etc.
I don't see what can be gained by moving them into a library file, and using the shell to call those functions, instead of leaving them in the shell program and calling them.
which is not a bash builtin (on Mac or Linux); use type instead:
$ type echo
echo is a shell builtin
$ type cat
cat is /bin/cat
$ type which
which is /usr/bin/which
$ alias a=true
$ type a
a is aliased to `true'
$ function f { true; }
$ type f
f is a function
f ()
{
true
}
Incidentally, zsh, the current default Mac shell, has both type and which as internal commands, with different output:
% which echo
echo: shell built-in command
% type echo
echo is a shell builtin
% which cat
/bin/cat
% type cat
cat is /bin/cat
% which which
which: shell built-in command
% type which
which is a shell builtin
% alias a=true
% which a
a: aliased to true
% type a
a is an alias for true
% function f { true; }
% which f
f () {
true
}
% type f
f is a shell function
Note that, on zsh, the "native" command is actually whence; which and type are equivalent to "whence -c" and "whence -v", where
% man -W zshbuiltins \
| xargs groff -Tutf8 -mandoc -P -cbdu \
| awk '
/^ [^ ]/ { out = 0 }
/^ whence / { out = 1 }
{ if (out) print }
'
whence [ -vcwfpamsS ] [ -x num ] name ...
For each name, indicate how it would be interpreted if used as a
command name.
If name is not an alias, built-in command, external command,
shell function, hashed command, or a reserved word, the exit
status shall be non-zero, and -- if -v, -c, or -w was passed --
a message will be written to standard output. (This is differ‐
ent from other shells that write that message to standard er‐
ror.)
whence is most useful when name is only the last path component
of a command, i.e. does not include a `/'; in particular, pat‐
tern matching only succeeds if just the non-directory component
of the command is passed.
-v Produce a more verbose report.
-c Print the results in a csh-like format. This takes
precedence over -v.
-w For each name, print `name: word' where word is one of
alias, builtin, command, function, hashed, reserved or
none, according as name corresponds to an alias, a
built-in command, an external command, a shell function,
a command defined with the hash builtin, a reserved word,
or is not recognised. This takes precedence over -v and
-c.
-f Causes the contents of a shell function to be displayed,
which would otherwise not happen unless the -c flag were
used.
-p Do a path search for name even if it is an alias, re‐
served word, shell function or builtin.
-a Do a search for all occurrences of name throughout the
command path. Normally only the first occurrence is
printed.
-m The arguments are taken as patterns (pattern characters
should be quoted), and the information is displayed for
each command matching one of these patterns.
-s If a pathname contains symlinks, print the symlink-free
pathname as well.
-S As -s, but if the pathname had to be resolved by follow‐
ing multiple symlinks, the intermediate steps are
printed, too. The symlink resolved at each step might be
anywhere in the path.
-x num Expand tabs when outputting shell functions using the -c
option. This has the same effect as the -x option to the
functions builtin.
Finally, note that the bash type command also has many options,
$ info bash -n 'Bash Builtins' \
> | awk "
> /^'/ { out = 0 }
> /^'type'/ { out = 1 }
> { if (out) print }
> "
'type'
type [-afptP] [NAME ...]
For each NAME, indicate how it would be interpreted if used as a
command name.
If the '-t' option is used, 'type' prints a single word which is
one of 'alias', 'function', 'builtin', 'file' or 'keyword', if NAME
is an alias, shell function, shell builtin, disk file, or shell
reserved word, respectively. If the NAME is not found, then
nothing is printed, and 'type' returns a failure status.
If the '-p' option is used, 'type' either returns the name of the
disk file that would be executed, or nothing if '-t' would not
return 'file'.
The '-P' option forces a path search for each NAME, even if '-t'
would not return 'file'.
If a command is hashed, '-p' and '-P' print the hashed value, which
is not necessarily the file that appears first in '$PATH'.
If the '-a' option is used, 'type' returns all of the places that
contain an executable named FILE. This includes aliases and
functions, if and only if the '-p' option is not also used.
If the '-f' option is used, 'type' does not attempt to find shell
functions, as with the 'command' builtin.
The return status is zero if all of the NAMEs are found, non-zero
if any are not found.
because most of the coreutil functionality is already availible in libraries of most languages. Article mentions that there are crates for the logic. The hard part is command line parsing and output formatting, and your library should have neither of those.
I've seen plenty of shell scripts rewritten in Python because they grew too big, and most of the time coreutil commands just get replaced with standard library calls. There are exceptions (like sorting files which do not fit in memory) but otherwise standard library is good enough
The problem is POSIX. It says operating systems must have mv, cp and all that stuff. This is the reason why people say Linux is not an operating system.
> I wonder why not reimplement coreutils as library functions, to be used within an ad-hoc REPL.
Funny you mention that. I've been working privately on such a "systems programming REPL" in my free time. Basically a freestanding Lisp with pointers and built-in Linux system calls. It's been a huge challenge trying to bootstrap and get the garbage collector working without any libc support, still haven't cracked it.
Languages like Python and Ruby already have system call capabilities. You can literally do anything with those calls. So this already exists in some form, albeit not in the extreme form I envisioned.
> I've been working privately on such a "systems programming REPL" in my free time. Basically a freestanding Lisp with pointers and built-in Linux system calls.
Are you building something similar to babashka? Would you be able to figure out what they did with babashka to figure out what you've been unable to do, or are you challenging yourself?
Thanks, that's a nice project I didn't know about! Always happy to see more projects along these lines!! I'm not sure to what extent it permits systems programming though. I searched the repository for common system calls like mmap and didn't find anything. I assume it links to either libc or JVM.
I suppose I'm challenging myself. What I had in mind is much lower level: a Lisp thing where I can use the Linux system calls directly. It's gonna look like this:
; mmap some memory
(set memory (mmap 0 4096 '(read write) '(private anonymous) -1 0))
; query the kernel for some data
; terminal size for example
; have the kernel put the data at the start of that memory
(ioctl 1 'TIOCGWINSZ memory)
; memory now points to a struct winsize
; decode the 4 unsigned shorts
; first two unsigned shorts are the terminal's rows and columns
The language runtime is completely freestanding: it doesn't link to any library at all, not even libc. I made it so eval supports a special system-call function which executes a Linux system call from C, and I want to build literally everything else on top of that. I want to be able to run strace on any coreutils binaries, see what system calls they make and then implement the same thing on top of the system-call primitive. It should be possible to make a coreutils module that contains an mv function, for example.
; boils down to:
; (renameat2 'fd-cwd "file" 'fd-cwd "renamed" 'no-replace)
(mv "file" "renamed")
I had to use static allocation to pre-allocate a stack of Lisp cells when the process is loaded just to get it to evaluate at all. Now I'm trying to get the garbage collector to work so I can get it to bootstrap to a point where it can allocate memory, read files and load more code. I wish I had something real to show for all this effort but right now it's not real yet.
$ txr -i winsz.tl
TXR doesn't really whip the llama's ass so much as the lambda's.
1> (let ((ws (new winsize)))
(ioctl-winsz 0 TIOCGWINSZ ws)
ws)
#S(winsize ws-row 37 ws-col 80 ws-xpixel 0 ws-ypixel 0)
I'd like to have system call support as a feature built right into eval. Maybe a JIT compiler that emits code with Linux system call calling conventions whenever eval encounters a (system-call ...) form.
Basically all/most Common Lisp implementations have a foreign function interface. Those who run on some UNIX or Linux need support for low level access.
See for example the SBCL sources for mmap and ioctl.
the coreutil rust flavor is 16MB per its release build. after strip it cuts down to 10MB, that's the smallest size you can have.
Comparing to c version coreutils, which totals up to 5.8MB, rust does have a slight size "problem", its size is 70% larger even with the busybox-all-in-one style.
you're actually forced to do it the busybox style, otherwise each single small utility of coreutils will be about similar size, say, 6MB for each one, then it will just blow up really fast.
I commented somewhere else, the rust stdlib by default(AND by design) is statically linked, which is totally different from the shared stdlibs like C and C++ etc, that will leads to large size when you have a few of rust release binaries. I have never figured out why Rust can not do the shared-stdlib -just-like-c-c++ yet.
is the 'unstable abi' by design too? or is it just still evolving? why can't Rust have a stable ABI like libc and libstdc++? this is the major reason I did not use Rust so far indeed, and I don't know Rust enough why it is the way it is as far as stdlib-static-link goes.
It’s partly by design and partly because it’s evolving.
the “by design” part is because Rust allows for packing structs in an order independent way. This is a nice optimization but it’s not something that is currently stable between versions.
Eg once upon a time I thought it would be fun to add a flag to ls to limit its results to a certain kind of file so you could list only directories for example. It came up for me as something I needed so I did it. Somebody on the gnu mailing list for core utils rejected it on the basis of ls "having a high bar" for a new flag. $ man ls suggests this hasn't been a consistent policy. It wasn't clear to me if that person was in charge or it was just a vague notion of theirs or anything useful so obviously I dropped it because I'd have something different from someone if there was any interest.
The Rust implementation might come to precisely the same conclusion that it isn't worthwhile. But they also might not on some case if not that one.
Do they do what's better or do they do what's Gnu always and everywhere?
Do they wait until they have significant traction and only then consider such things?
Interesting questions for them to ponder and maybe they have?
One of the maintainers of uutils here. We have a few flags that are not in GNU for one reason or another. Some were rejected by GNU, others come from other coreutils implementations like FreeBSD. We document those at https://uutils.github.io/user/extensions.html
We tend to do this sparingly, however, because even just adding new flags might break existing scripts that use abbreviated long options. For example, if the flag you propose is called `--filter` it might break scripts that use `--fi` as a shorthand for `--file-type`, because the prefix is now ambiguous.
Oh wow, I had no idea that was a feature. That significantly hampers extending the tools in the future. Has there ever been a discussion to consider breaking that functionality? (I understand it could be a huge impact & not worth it, but just curious to read the discussion if it exists.)
I can't find any discussion. But I know that some alternative implementations disable this feature. The cause of this behaviour is actually the glibc `getopt_long` function, which does this automatically, so it can't be changed until it's changed in glibc, which has been rejected at least once that I can find: https://sourceware.org/bugzilla/show_bug.cgi?id=6863
That's the first time I hear of this "feature" (using `--fi` instead of `--file`). I tried a few commands in my shell and none actually support it. How common is this?
How about introducing three dahses for these custom parameters/flags?
Regarding filtering. In a busybox situation depending on some regex crate is a given anyway, but if we are talking about separate binaries adding it to `ls` makes it bigger for everyone, even for those who will never use this feature. Does piping the results to grep make things so much slower that adding filtering to ls is "worth it"? Is this pushing the "philosophy" of do one thing well and be compostable too much? How many kilobytes are we talking about anyway?
Care to share your opinion on these theoretical/pragmatic questions? Thanks!
It's possible I suppose, but three dashes already sometimes appears in GNU for hidden options and, probably more importantly, I think it would be frustrating to have to remember whether it was `--filter` or `---filter` for all long flags.
Maybe uutils could have a build feature that specifically turns off the prefix matching and will break stuff but allows using newer and more useful flags in exchange. I've VERY rarely seen prefix-matched flags being used so I'd wager a distro could be fine deploying it that way.
Ie, setup the features "gnu-compatible-opt-matching" and ship it by default, then gate the extra features behind not turning on that feature.
It's a good idea to make the prefix matching optional. I think it might be confusing to gate other features behind it though. I guess we'll get to this once we find flags that are important enough. So far, we haven't really had significant issues with this; compatibility remains our primary focus for now.
> In a busybox situation depending on some regex crate is a given anyway
uutils can behave as a busybox-like binary. But I think there's some confusion over the requested feature, because that can't really be done with regex, but you have to inspect the file metadata to check the type of file. That's also why a grep solution doesn't really work, unless you use `ls --classify` and then use the indicators to filter in `grep`.
> if we are talking about separate binaries adding it to `ls` makes it bigger for everyone, even for those who will never use this feature
It's generally not these kinds of features that increase the binary size, but if it does we could also introduce feature flags for it, where you can choose at compile time whether you want the uutils extensions or not. It still adds a bit of a maintenance burden of course.
> Lots of crates (Rust libraries) available - Don't have to reimplement the wheel:
> lscolors, walkdir, tempfile, terminal_size
i believe this isn't always quite an advantage that the slides make it out to be when implementing tools as critical as coreutils. you typically would want internal packages that you can precisely control.
wholeheartedly yes; this addresses nearly all the concerns i had in my comment.
the only other concern is the ability and the added time to make changes to these dependencies. what this sometimes means in practice:
in terms of time: you may have to wait for upstream to accept your change. alternatively, one could maintain a fork of the package and replace the dependency to point to the fork while waiting for changes to be accepted, however doing so adds back-and-forth work.
in terms of ability: upstream may reject a change.
after the change is merged upstream, you are required to vet commits in the dependency between the last previously vetted commit and your currently merged commit, all at once, before you can upgrade the dependency in your original project.
How many transitive dependencies result? And what licenses are those crates released under?
Edit: I feel like this is often overlooked, but most licenses require including a copy with binary distributions, and wrangling all those text files can be surprisingly cumbersome. Omitting a license can lead to headaches down the road.
Not the original poster, but coreutils are almost given similar design requirements as airplane software. They need to be fast and they need to be perfect. Having a dependency tree which invokes items that (potentially) don't place as much emphasis on being error free can lead to buggy software.
Remember, there is no reason for anyone to choose the rust implementation, other than philosophy. They need to match or surpass the C implementations with an extreme degree of consistency to be worth the risk of transferring.
So they vetted a library and chose a version. That version is the version they will use until they choose another version - that's how rust works. It's not npm, you specify a specific version and that's the code that's used, no matter how upstream changes things. It's not C where you load some random .so with the right name and hope that it's compatible (or have entire giant systems built around managing library compatabilites ala Linux distros).
I've got code in prod that uses pre-async versions of tokio and it still builds and runs just fine with the latest rust nightly. If there does turn out to be a problem with the version of some library I chose, and upstream has become incompatible, nothing stops me from vendoring the upstream and fixing the problem my own way. Until then, cargo/crates/rust guarantee that the code I vetted and chose is the code I'll build with. Why is it so vital if the sequence of bytes is stored one place or another?
> It's not npm, you specify a specific version and that's the code that's used, no matter how upstream changes things.
Note that that _is_ how NPM behaves. (At least, when using a lock file, which is the default behaviour like with Cargo. If you don't use lock files then neither Cargo not NPM can guarantee this property.)
vetting, and dependency locking, addresses nearly all the issues. there's one other issue: the ability to easily adjust the code as you see fit, so it is vital that you are able to precisely control the code.
> Not the original poster, but coreutils are almost given similar design requirements as airplane software.
I mean, a lot probably are used in literal airplane software.
It also has to be good from as early as possible; some software still in use uses decades old code and libraries which cannot easily be replaced (think also embedded software).
Code is cleaner too, fewer hacks than typical Coreutils code. You get the benefits of BSD style clean code while still having high performance because there are clear functional boundaries and specific features are compartmentalized in modules and dependencies. Systems programming has finally caught up to 30 years of advances in software engineering.
Long live the Rust Evangelism Strike force. Finally Apache/Linux is now possible.
> Code is cleaner too, fewer hacks than typical Coreutils code
I was under the impression that GNU Coreutils is weird/hacky in implementation because of 3 reasons (in no particular order): 1. Performance, 2. Portability, 3. To make it blatantly obvious that it's not copied from proprietary UNIX™ code. The last point is... unfortunate at this point in time, if unavoidable given the history. For performance, I'll be interested to see how it goes; Rust should help things compose better and lend itself to saner structuring, but the harder you lean into performance the more unavoidable complexity you incur, so it'll be interesting to find out how well new implementations can do. And portability, I suspect, will be the dump stat of any new implementation. Sure, it'll work on Linux, on x86 and ARM. Beyond that... I mean, last time I looked Rust didn't even support as many CPU architectures as Linux, let alone as many OSs as GNU does. I'm not sure if that's a problem or not; so far the precedent seems to be to just break anything unusual and not care (looking at you, Python cryptography), which is a win for the masses and unfortunate for everything else.
>the harder you lean into performance the more unavoidable complexity you incur, so it'll be interesting to find out how well new implementations can do
While true, especially with string heavy processing, using the borrow checker with CoW structures to cleanly save allocations does more for boosting performance than some hacky C++ code where you never know when it breaks because there are no guarantees on mutability and lifetime of variables.
I've seen that rust based tools seem to be significantly faster on average than their traditional C versions. Seems to be due to using newer algorithms that make use of modern CPU features.
It also helps that Rust is easy to parallelise. When making some sort of FS discovery software concurrent consists of adding a dependency on rayon and a par_iter() to the main loop it gets easier to take advantage of manycore systems.
Though then you can get the issue that systems don’t really tell you about P/E cores or how they’d want you to use them, and that’s annoying.
I even contributed a little, back then. I guess writing basic versions of "ls", for example, is trivial. But there's a lot of work getting all the tools done, with all the flags implemented and behaving as expected.
I guess there are tools like busybox, toybox, and similar, which also implement a lot of "stuff" to varying degrees of completion. From my side the biggest takeaway from those projects is the sheer convenience of deploying a single binary and installing symlinks to change functionality.
I replicated something similar with my sysbox project, collecting tools together in one golang binary with various subcommands:
I currently build my nixOS packages with musl, clang, and uutils, and the difference from gcc/coreutils/glibc is unnoticeable. The uutils project is great
That's the inverse of my question. If Rust is going to replace things written in C, other stuff is going to want to dynamically link to it.
A statically compiled Rust based replacement for an entire distribution isn't a realistic proposition, unless you fancy downloading a gig or two every time there's a security update and everything has to be rebuilt.
Theoretically, you could dynamically link with other Rust code that exposes the standard C ABI. This used to be common for C++ code, when name mangling was different between different compilers and versions - so a C++ library that wanted to be portable had to expose a C ABI, and C++ apps would dynamically link to it by calling that C ABI. Of course, this meant no exceptions, no destructors, no std:: data structures, but such was the price.
It had some CVEs but not many [0]. I think the better argument is that some of the original code is just really hard to read. Click around the repo [1]
I just wish it was copyleft like the real coreutils instead of being pushover-licensed. Now the corporations are all going to start making proprietary forks of this.
This always seemed like a terrible idea to me, from the corporation's perspective. Sure, great, you spent a few thousand dev hours and found a clever way to significantly improve the performance of your mallocs. Awesome, you saved 5% on your hardware costs. Good deal.
But now you've got a private fork to maintain. The guy who figured out the optimization was promoted and switched orgs, so the other guy on that team is in charge. There's not quite enough work for him to work on merging stuff in full time yet, so he'll do other stuff, but within a few years, keeping your private fork up to date with security patches is a full time job, then more than a full time job, and nobody wants to join that guy's effort because there's no glory and no promotions in it. But there's no getting off this ride anymore, and nobody's doing expensive performance testing anymore because the guy who knew how to do it is long gone, so I hope it's still making a difference. Ugh. Just share the fix.
Or a proprietary fork to not maintain, and now nobody else is doing it either. This works for getting a product out of the door so if your planning horizon is the next three months it looks like a great idea.
Maintaining private forks/patches is certainly not a full-time job with git rerere, it's rather trivial. I do maintain hundreds of forks/patches updated daily fully automatically, and I have to engage in manual fixups for maybe 10hrs a year.
The fixes/extensions are shared, but don't expect them to be looked at. On some projects some PR will take years. And some, like OpenSSL or coreutils outright refuse to add new features, or fix code quality.
This makes sense from an engineering perspective, but misapprehends the purpose of most corporations, which is to funnel rent and laundered government money (subsidies, zero interest rates, contracts) into executive salaries and shareholder returns. This explains why most companies don't care about their products or the people who make them: they're perfunctory.
I think that's mostly not a thing. The problem is largely imaginary. Most MIT/Apache/BSD/etc. licensed stuff isn't commonly forked by big corporations. Why would they? It's mostly not that practical to maintain private forks of software. And so what if they do?
These are still OSI endorsed licenses of course. So, bona-fide free and open source software as defined by the Open Source Initiative.
Referring to them as push over licenses is not really that constructive. All sorts of respectable OSS software is licensed with these licenses. I'd go even further and state that if you remove all the OSS software that doesn't have a copyleft license, the whole landscape would get a lot less interesting. That gets rid of most popular libraries, lots of popular software packages, all kinds of critical could infrastructure components, development tooling, etc.
The vast majority of OSS software I depend on is actually Apache 2.0 and MIT licensed. There's a fair bit of GPL v2 and MPL in my life as well. GPLv3 and AGPL are just much more of a fringe thing. I tend to actively avoid depending on any such software. Just too much hassle in terms of endless debates of what is and isn't allowed with such licenses and people getting upset if you actually try to use the software for something not endorsed by them. Mostly the venn diagram of such people and people producing useful things to me is pretty narrow anyway. I know legal departments in big corporations tend to have similar policies.
Ever heard of Android, cloud or SaaS? For example our phones and the cloud services they use are choke-full of FOSS, but we plenty of surveillance and very little freedom. Same for modern cars, TVs and many other things.
There are plenty of user-hostile services that use FOSS internally. Weak licenses are huge enablers for this behavior.
You make that sound like it's an accident and a scandal instead of the huge success it is. Most corporate software at this point consists of large parts of open source stuff. It's fantastic what businesses you can build with this stuff.
None of that software would have happened if it weren't for big corporations putting their resources behind those things AND pooling their resources by sharing code under an OSI approved license with each other. This is open source working as intended. Millions of developers are creating value and are relying on each other. And yes, some money gets made in the process. That's the whole point of committing that much resources. Most software development simply is not charity. And the willingness of companies to spend resources on software is typically motivated by their ability to use the resulting software.
And people can and of course do fork Android. E.g. Amazon and Huawei, etc. have nice products based on Android that don't include any Google proprietary bits. And there are many other android based and derived things out there. Likewise the various cloud providers have a lot of shared components that they depend on and they also are contributing a lot of code. Many of the smaller ones pretty much just use things like openstack. And they all rely on the same big open source things: Linux, Mysql, Redis, Docker, etc.
This would simply not happen with licenses such as AGPL. Easy to say, because it hasn't happened and shows zero signs of actually starting to happen.
GPL (the original coreutils license) does not solve for Cloud or SAAS in any case. It’s not really even clear that AGPL does in something like coreutils.
But android took linux, which has a copyleft GPL license.
So the argument "use a copyleft license or get your work stolen like android" doesn't really work. You'd need to argue that you need a strong copyleft license. Which is a tougher argument to make, because people dislike those more.
And that's why its extremely imaginary in this context, these are "core utilities", not a distributed database. Their value is in being the same everywhere and installed on every system, not in a SaaS interface. The very basis of uutils itself is that it is the same as the GNU equivalents.
One irony is -- look at the linked article's comments -- I have to imagine some of the people saying "Rust is not ready for X because it doesn't have multiple implementations" are the same people saying "Don't create an alternatively licensed implementation of coreutils". Just completely unprincipled, untethered reasoning.
And don't forget the king of ironies -- GNU coreutils themselves are a reimplementation of proprietary code.
> There are plenty of user-hostile services that use FOSS internally. Weak licenses are huge enablers for this behavior.
This is obviously a red herring. Any bad behavior would be the same re: the GPL and the MIT license in this particular instance.
The answer is always -- do better than your competitor. If the GPL is better, fork the MIT code, and build a better alternative. The problem is -- the people actually contributing chose the MIT license. And that should be the end of the matter. You don't get to have an opinion on someone else's choice of license if you contribute nothing.
This combination of bullying and whining by GNU/FSF advocates is extremely off-putting. I'm a contributor to uutils (but I don't speak for it!) and watching the apoplectic whinging by Redditors and HN commenters just re: this project has completely turned me off the GPL for my other projects. When will people realize that being a completely insane socialist/MAGA/atheist/Christian/FOSS advocate at a dinner party is a turn off?
Afraid I don't understand how me telling you to "Quit whining and fork the project" is actually the true whining.
> Plus you are resorting to insults and yet you claim I'm the unreasonable one.
I don't think I characterized anyone as unreasonable. Your examples are not analogous to the present situation, but I think your real problem is you're telling other people, very well aware of the Good News of GNU/FSF, something they already know.
I, and others, have taken a very clear eyed looked at the GPL/FSF and said "No, not for me," and I wish you'd understand one of the reasons why is yours and others rampant religiosity. I don't comment under GPL licensed project, "Hey I wish you'd change your license to MIT because [red herring, and bad faith argument...]" because I know it's in incredibly poor taste as a non-contributor. MIT/GPL/AGPL/etc isn't always my cup of tea either but its simply not my choice.
That said -- sure, if you/anyone has a merits based case to make as to why something makes more sense as GPL licensed, I'm all ears. But, as it stands, your arguments are in poor taste.
It is but it uses GPLv2, which has a few accidental loop holes that they tried to close with v3. The net result of this is that with some defensive strategies, you can pretty much do whatever you want with Linux and get away with such things as bundling proprietary drivers, firmware or running closed source software on it. Which is a reason things like Android can exist.
To copyleft people that's a bug, for most Linux users, that's a key feature of the license. Without that, we'd all be using BSD (like most mac users are) or something else.
> There's literally nothing to gain from doing so.
And yet KHTML was forked into WebKit which was forked into Blink, just to name one.
OSX has forked BSD code, as did windows.
This things happen all the time. MIT is a fine license, so is GPL, but we can't just say "oh nobody is going to fork this, it's dumb!" because it happens all the time.
> And yet KHTML was forked into WebKit which was forked into Blink, just to name one.
Bad example. KHTML was LGPL-licensed when Apple forked it (looking at https://invent.kde.org/frameworks/khtml, it may be dual LPGPLv2/GPLv3 or later licensed now). That probably is _the_ reason WebKit always was open source.
sure, but this is not what I am talking about, I am just saying that "there's literally nothing to gain from making forks" does not seem to prevent organizations from forking things.
You maintain it as best you can. Many of the licenses require attribution. Sometimes things are announced and so on. Can it be definitive and complete? Obviously not. Could it be useful even so?
There’s nothing more annoying than GPL absolutists. Who really cares if someone makes a closed-source fork of the code? On the other hand, given that closed-source code is going to exist, isn’t it better if it’s able to use the best code available rather than forcing developers to either ignore the existence of something because it’s GPL’d or just ignore the license in the first place?
There’s a reason I choose MIT over GPL. It’s a freer license.
GPL protects the software itself against becoming proprietary. Developers are forced to ignore GPL'd code only if they are unwilling to distribute their code under GPL, because the company is unwilling to make a pledge to respect the freedom of the users. Writing more GPL'd code gives advantage to those who are willing to respect the user's freedom.
> Developers are forced to ignore GPL'd code only if they are unwilling to distribute their code under GPL
If the choice is "you only have one choice", that's not really a choice at all?
I have complete freedom to choose the licenses for my personal projects and I stay away from GPL and its variants (still love copyleft and the MPL2!), because devs are users too, and I (and they) see the GPL as very dev unfriendly.
The GPL treats "users" and "devs" as abstractions, when in a very real sense, the devs are the ones most likely to use the code, and they are more likely to use it when it's more permissively licensed. And many are very pleased/want to use copyleft, if it isn't the GPL, and the FSF's ridiculous interpretations thereof.
"More" or "less" free is a word definition game that is mostly a chimera. It depends on what the purpose of the software is and which business model is the most fit.
Copyleft software has several vendors compete around the same product. Every economic actor has to compete to add business value and consulting, with the knowledge that any extension to the software is made available to everyone else.
Free and non-copyleft software leaves vendors free to compete with different products built from a common base. Extensions to the software has commercial value and every vendor seeks their local maxima.
So the situations are different. GPL and BSD software do not often compete in the same space, with a few exceptions. There used to be a commercial variants of BSD, but they are all gone, outcompeted by a common Linux platform. FreeBSD was technically superior for a long time but couldn't compete with a multi vendor product in the long run.
Products that represent a common platform shared by several products however are more successful non-copylefted. Formats such as gzip and jpeg are completely dominant due to their multi product usage, and any GPLd codecs that competed with them are mostly forgotten.
That is why FreeBSD is everywhere, with nice contributions from all its commercial users, and clang is ISO C++20 compliant thanks to all those nice compiler vendors thinking of upstream.
Linux owes its ubiquity over the BSDs more due to the fact that it didn't have to fend off lawsuits at a critical point rather than due to license choice.
Minix is BSD licensed and is present in Intel CPUs[0]. So I think it would be a toss-up as to whether BSD or GPL-licenses OSes are more widely used overall. To be clear, I'm making no comment on whether Intel using Minix is a good thing for their customers, but it's likely that the BSD license was what Intel wanted and caused it to become the most widely used OS few people know about.
I know that was not your main point at all, but fascinating to see that MSVC is the only compiler to be fully 100% standards compliant (with only GCC being really really close).
Yes, LLVM has achieved the same contribution level as the Linux kernel[0], yet where are the contributions from ARM, Intel, IBM, Apple, Google, Green Hills, Codeplay (now Intel), NVidia, Nintendo, Sony, TI.... into clang?
While Apple and Google have switched focus to their own efforts (Swift, Objective-C, Carbon, C++17 being good enough), there are plenty of compiler vendors on that list with forked clang for their proprietary compilers.
And which of these proprietary Clang forks have greater C++20 compatibility than free software Clang?
The only proprietary fork of LLVM in the compatibility table you linked is less C++20 compliant than free software Clang.
So unless you have some causative explanation, I think the more sensible possibility is simply that the Clang developers have prioritized working on some of the plentiful other features of a compiler toolchain than perfecting their C++20 standards compliance.
Blindly asserting (or implying) that if LLVM were GPLv3 then it would be more standards‐compliant, with nothing to back it up, doesn’t add much to the discussion, IMO.
Who knows certainly? We would need to buy them them all.
Even if not, they are clearly waiting for others to do the needful for free, and then gladly recoup the fruits of others labour while selling their compilers for profit.
Consider the Linux kernel. First it would probably have never been successful with a liberal license in the first place, but let's ignore that for a second and assume it got successful anyway.
If you look at the current situation where each SoC supports only one (generally very old and very vulnerable) kernel, I think we have an the reasons to be happy that they at least have to release the sources so hopefully someone can extract the patch and have it work on recent kernel. It's not very hard to imagine the situation where we'd only have the binary kernels if we did not have the GPL.
> If you look at the current situation where each SoC supports only one (generally very old and very vulnerable) kernel, I think we have an the reasons to be happy that they at least have to release the sources so hopefully someone can extract the patch and have it work on recent kernel.
The problem is, many vendors don't release their kernel sources at all - particularly Chinese pseudo-brands based on some knock-off of Mediatek reference designs come to my mind - and those that do release their code often show horrible code quality that would take many man-months of work to clean up enough to be included into the kernel, not to mention the lack of documentation and the tendency for these forks to fix undocumented hardware quirks for specific SoC revisions in kernel code.
And Google, the only entity in town that could push for better behaviour at least in mobile, doesn't do anything to help out on that front as well - there is not a single mention in the CDD [1] (the requirements if you wish to call a device Android compatible and use the Google ecosystem) mentioning licenses outside of a warning that implementers have to take care about software patents for multimedia codecs.
Non-phone/tablet embedded applications are even worse. I admit my knowledge is dated, but I came across a lot of devices over the years where their source code dump had u-boot versions that were half a decade outdated and so many changes done compared to even the version of u-boot that it claimed to be that it was completely infeasible to even attempt and migrate the changes to a modern u-boot version.
And to make it even worse: Modern "device integrity" crap makes the situation completely impossible to resolve - even if you had a decent-quality code dumps not deviating too much from upstream, you can't deploy your code to the actual device in question to test if the device still works because the device doesn't allow flashing unsigned/self-signed images, test points (e.g. JTAG) are fused-off in hardware, and even if memory chips can be flashed with a programmer (which isn't a given, since if you power the flash chip using a clip probe, often you power the rest of the board with it!) the flash chip content itself is signature validated. Oh, and it still can get worse than that given the rise of "secure element" co-processors that can be used to decrypt flash chip content on the fly - you don't even have a chance to read the firmware content without first having a code execution exploit on the device and then achieving code execution in the "secure element". The people able to do this are short in supply and most of them work in jailbreaking gaming consoles, not a 30 dollar Chromecast or similar appliance.
We need laws and regulations against that crap, but politicians don't even have that on their radar - how would they, given that a lot of politicians are fucking gerontocrats and of those that aren't, no one outside the various European Pirate Parties and affiliates has a tech background to even push for such regulation.
I would rather have transparency and portability (from having access to the source code) than quality code (you would be surprised how many project are written in extremely bad quality).
The thing is, we already have transparency and portability, and yet it is effectively useless because the code is in such crap quality that it is completely infeasible to do any serious work with it, and because bootloader security makes third party development very difficult to outright impossible (not to mention fourth parties like banks, games or DRM punishing you for exercising your rights by cutting off your service).
My understanding is that Amazon was not even making a closed source fork of ES, they were merely offering managed hosting of the open source version of ES. As Elastic is making money by selling hosting, they did not like the (admittedly unfair) competition from AWS, they decided to stop doing Open Source. And actually now, AWS is maintaining an open source fork of ES...
> Amazon was not even making a closed source fork of ES, they were merely offering managed hosting of the open source version of ES.
That sounds even worse. Whatever happened, ES didn't gain anything by having an open business friendly license (as it appears from the events).
> As Elastic is making money by selling hosting, they did not like the (admittedly unfair) competition from AWS, they decided to stop doing Open Source.
That's sad. ES went from "business friendly opensource license" to "closed source". Now there's no open-source code at all (from ES). Some restrictive open-source code is better than closed source. Always.
> And actually now, AWS is maintaining an open source fork of ES...
I think, it's not out of generosity. It has customers.
Elastic has never been doing open source for generosity - they have customers. They're admittedly much smaller than AWS, but they're venture backed and have money. They're not a small open source project, even though they sometimes portrayed themselves as such.
They did profit from being open source, being based on apache lucene. They profited from the work that the community did for them - I used to be part of the folks running one of the user groups. I was on their IRC channel, helping other folks adopt ES, for free, because it was open source. And they did flip those folks a finger and went and made it closed source because they ended up having a spat between businesses. It's their code, I get it and they get to do whatever they decide. But Elastic is very much not a paragon of open source virtue in shining armor. They, too, have placed business interest before community interest.
Right. Elastic didn't do it for generosity, either. But your point that ES got benefited from the business friendly license and then made it closed source just supports my point that businesses will just maintain their forked-version rather than putting it out there in the open. I'm all in for open-source software, but I'm just making a point that business-friendly open source licenses don't necessarily beget more open source code. Sometimes a restrictive open source license like (GPL) is far better than a business-friendly license for open source community.
GPL would not have had any effect in Elastics case, private forks are legal under the GPL. AGPL might have. But then again, AGPL would have been a major hindrance for corporate adoption since no one wants to open source their whole tech stack just to add search.
The interesting thing is that Elasticsearch has an open source competitor, Apache Solr. The community around Solr is organized more like the community around postgres - multiple actors that work on a shared project that not a single one controls. Anyone of them could make a proprietary fork, but the others could quickly band together and punish that.
So the lesson to draw here is maybe that the license itself matters less, but what matters is whether there's a single actor in control of the project. Because in the end, the reason why Elastic could pull this stunt is less about the license, but about the copyright ownership and control over the project. They owned the code, had a CLA in place for any contribution and thus could do whatever, license be damned.
AGPL or GPL doesn't matter here - AWS were fully compliant with the AGPL in their use of ElasticSearch (they were already releasing all the code, even though they were providing it as a managed service).
AGPL would probably matter here - it's viral and would potentially require open sourcing all software in the stack that interacts with it, for example the systems that AWS uses to manage the Elasticsearch instances. In any case, AGPL is on the "do not use" list for many enterprises, regardless whether it would have any effect or not in the specific case. Engineers that want to use AGPL software often would have to go through legal, so it does stifle corporate adoption.
The only difference between the AGPL and the GPL is whether offering access to the software over a network is considered distribution or not by the license itself (AGPL says yes, GPL says no). Otherwise, they are both exactly as viral.
You are quite right to some extent about corporate policy differences. I think the status quo might be trending away from this extreme caution, but it's still there in many corps.
> And they did flip those folks a finger and went and made it closed source because they ended up having a spat between businesses.
I don't think it's fair to say ES is now closed source. It's no longer FOSS nor GPL-compatible, but the source is very much still open, and nothing much has changed for many users of the self-hosted versions (source: shipping a proprietary appliance-like product with ES as a component, with full legal checking that we are in compliance of the license).
It is no longer open source, there’s no doubt about it. You may have a free beer license to it as long as elastic grants you one, but you have no right to modify, patch, redistribute as you see fit.
Open source projects that relied on ES are cut off.
You are still free to modify the code, patch or redistribute it under the terms of the SSPL. If you prefer the ESL, you may only modify the code, but not redistribute it.
Now, if you were distributing a GPL product that included ES, you will have significant problems, since the GPL and SSPL are not compatible - so you may in fact be unable to distribute the whole product anymore, which probably caused huge disruptions to some projects - so I'm not in any way saying that what they did is nice. But it was definitely not making it closed source, not in spirit and not in effect.
I explicitly noted that they are no longer FOSS, but I beleive its wrong to call them "closed-source".
If parent had said "they are no longer open source", I wouldn't have commented this at all, since "open source" and FOSS are essentially synonyms. But the antonym of FOSS/open source is not "closed source".
I don't remember which license ElasticSearch used to be under back when it was open source, but there are others who faced the same problem even with the most copy left of licenses. MongoDB wen through the exact same problems and changes, even though it started under the AGPL. So exactly what kind of OSS license Elastic had been using is irrelevant - you can't compete with AWS on hosting open source code.
This went in the opposite direction you'd expect. The original ElasticSearch is proprietary now, and Amazon's fork of the last FOSS version is itself FOSS.
The ideological position is the whole point of the GPL, after all. If you're using someone else's labor, it's only fair they're allowed to dictate the terms of how you're allowed to use it.
> If you're using someone else's labor, it's only fair they're allowed to dictate the terms of how you're allowed to use it.
And they are, that's what choosing a licence is all about. In case of MIT/ISC/BSD/WTFPL licensed software, this someone else just decided to dictate a lot less than he could have.
Except many of them then come out complaining how Amazon (just to pick an example) is leeching their work, while they are perfectly in line with the chosen license.
And that's why whenever I want to contribute to a MIT/ISC/BSD/WTFPL licensed project, instead I just fork it and make my own changes GPLv3 only.
If companies can make proprietary forks, I can make a GPLv3 fork. Best of all, no company can take my changes and make them proprietary. And if the maintainers want to merge my changes, they've got to relicense to GPL.
When you fix a bug in X11, you keep it in your one‐star‐on‐Gitlab personal fork under GPLv3 to prevent it from ever making it into HP‐UX, rather than contribute it to the upstream project, where tens or hundreds of thousands of individuals using X11 under free and open source licensing in free software distributions (not proprietary forks) could have experienced the benefit of your fix?
I mean, you’re free to do that, but taking pride in it sure feels odd to me.
I don't use X11, but I've had a very negative experience trying to contribute back to MIT/BSD/etc licensed projects anyway, as they're usually just one close team of developers publishing something and rejecting any outside PRs or requests.
License choice is a political choice, and it's not a coincidence that corporate-friendly licenses are usually on projects that have a strict vision and reject outside changes.
I've got the features I want, in the way that I want them. If you want to use them, you're free to switch — for some of my projects, tenthousands of people have done so.
Plenty of GPL fans try to convince others to choose the same license, almost always implying that the GPL is the only moral/ethical choice (Stallman himself being the most ardent believer of this stance). They often even insist that the GPLv2 is not actually a moral choice anymore, given the evils of TiVoization.
This whole thread is about complaining/supporting this attitude - not about criticizing those who simply chose the GPL as their license.
What you’re implying is a fault exclusive to “non‐copyleft fans” happens in copyleft‐licensed projects too, like MongoDB, which was AGPL before it decided to switch to an even more restrictive license to counter hosting service companies.
A: "This is why I don't use GPL"
B: "Yes, you shouldn't have to take an ideological position just to publish some code"
C: "But the ideological position is the whole point of GPL"
D: "but the ideological position is not the whole point of writing code."
C: "Then don't use GPL code?"
D: ".... exactly?"
> How about the freedom to ship something without having an ideological position on when/how it links against other code.
Is that possible? Choosing to publish something to... oh, let's say to release it into the public domain (that seems like the farthest you can get from copyleft) is still an ideological position.
Firstly, closed source would likely be the antithetical counterpoint to GPL. I refuse to even call it copyleft, because that term is a purity test. I’m a leftist and I BSD/MIT my work so that others may benefit from it. If it’s valuable, they may contribute back. If they don’t, I still haven’t restricted their freedom or choice. An ideological position is usually one that is rigid and uncompromising. GPL/‘copyleft’ is a way of poisoning the commons.
If you release something and expect to retain copyright — that is, you wish to prevent me from decompiling it, redistributing it etc — then you are also taking an ideological position.
In practice, there's no disaster scenario. The issue is the potential for leaving effort and hard work on the table, because a company could improve upon your code and never release it.
For example, we have Rocky and Alma Linux because so much of RedHat is based on GPL-licensed GNU/Linux. They technically don't have to release any of the MIT components, they do it because they are nice stewards of open source. Same with Ubuntu. And SUSE. Other companies could be not so nice. For instance, note that Google Chrome is not technically open source. Only Chromium is, which Chrome is built from.
In reality, MIT binaries without the source code are rare.
There are often situations in which the rational approach is to upstream enhancements, if only because it reduces ongoing maintenance of a derivative product. This is especially true of foundational infrastructure. BSD/MIT-like licensing works well for such software (perhaps less well in general).
Yes, the GPL is very important. The BSD license _is_ a problem. Those of you that were around when FreeBSD was maybe going to 'win' remember why Linux(GPL) ended up capturing everyone's attention.
You can not count on your code staying free if it's under a BSD license. If there is a way to make money on it, someone will fork it privately and force you to pay for it. If they don't succeed, you will still be under continual threats that you will have to pay for it.
FSF/GPL advocate: "I don't like that someone else is reimplementing this code."
Jesus, you think s/he ever wondered how AT&T felt about Linux and the BSDs literally reimplementing all of Unix?
Oh, that's right. This is the FSF/GPL conundrum -- you're likely to end up on both sides of every issue. You will literally be both a copyright maximalist re Linux and a copyright minimalist re proprietary code, music and film.
That's the main thing I have against the Rust people. They seem to like pushover licenses instead of GPL, perhaps in the hopes of getting hired by one of the big tech corps at some point. If only they realized what made free software so resilient over years..
Not sure it's fair to generalize over all "Rust people"
I've worked in the community and it's annoying that it's so against GPL in some corners, but most people are just pragmatic. They use whatever license is the norm for sharing with their community.
Now that Rust is becoming less tight knit it might open up, hopefully, to more diversity in licenses.
It isn't. I don't intend to generalize to everyone using Rust. But there is a large overlap between Rust evangelists and anti-GPL activists, it seems. I don't want to believe this, but it almost looks like a corporate conspiracy to undermine free software.
AFAIK there definitely is a preference for Apache/MIT dual licence but this is probably because of the influence of the Rust compiler itself. Most of the initial Rust code was written by people involved with and in service of the Rust compiler. Since that’s licensed MIT/Apache, their code had to be as well if it wanted to be used in the compiler.
There were people who used funky licenses like WTFPL (whatever the fuck you want to) and Unlicense (public domain) but they came around and licensed to MIT/Apache for uniformity with the rest of the ecosystem.
Maybe if the compiler had been GPL the ecosystem might have been as well, but that’s a what if. We’ll never know if a hypothetical GPL Rust would have had the same path to success as the existing Rust.
I'm a big fan of copyleft licenses, but when the objective is to push the industry toward a better standard, as opposed to a specific implementation, permissive OSS licenses make sense.
How do you measure robustness except with failures / years of service x times deployed ? You cannot make edge cases and strange unique behavior happen in a controlled environment.
> You cannot make edge cases and strange unique behavior happen in a controlled environment.
Sure you can. Go iterate through the GNU Coreutils bug tracker, find weird bugs, create test cases, feed them to your implementation. Bonus points if you can find an existing test suite (ex. xfstests turns out to be good for testing arbitrary filesystems). Granted, there will always be edge cases that you don't catch until they show up live in prod, but you can hit a chunk of the space without that.
Also, there are plenty of package system that uses coreutils for their build instructions. Just build the whole thing with coreutils aliased to the rust implementation and check for errors.
there will always be edge cases that you don't catch until they show up live in in prod.."
in reply to,
"How do you measure robustness except with failures / years of service x times deployed ?"
The only viable live testing environment that springs to mind might be running your test code synchronously at the atomic level with production, which I'm convinced only IBM Z/OS on a Parallel Sysplex cluster running CICS can do.
I'm not seeing a contradiction. The original claim was that you can't measure robustness without testing live, because you can't reproduce edge cases in a lab. But that's not true; you can reproduce edge cases that are known in a lab. This isn't 100% effective, granted, but it's effective enough that you certainly can test robustness to a reasonable degree. It's like saying that you don't know how safe a car is until you've drive it 100,000 miles on the real roads; Real Life™ will find things you missed in testing, but you can still run enough crashes to get a decent idea of how safe the car is.
One tool that (I think?) wasn't around, wasn't used as much, or couldn't be deployed at scale back then was fuzzing tools; modern-day application of fuzzers on existing libraries and tools has revealed thousands of potential bugs.
If fuzzing, unit tests and things like mutation tests are written and automatically applied to these libraries/utils, on top of being written in a proven memory-safe language like Rust, I would assert they're pretty robust.
Second, there's more known knowns this time around; they can write test cases for the decades of bugs and rare occurrences found in the tools they replace.
That said, I'm sure there's still plenty of unknown unknowns that can only be uncovered with, as you said, years of service and times deployed.
This is great. Chimera Linux uses the FreeBSD userland version of the utilities. It's refreshing to see new distros taking charge of their development.
I wish someone would build modern lean utils. I don't need bazillion grep flags. There's so much unnecessary repetition in unix tools. Why make separate flag for input and output files when I can just redirect input and output with shell, just for one example. Theoretically most unix tools could be implemented in very few lines of code and that's their beauty.
> Theoretically most unix tools could be implemented in very few lines of code and that's their beauty.
As usual, theory and practice should be the same in theory, but they are different in practice.
It turns out that the Unix philosophy of gluing together programs by spitting out and re-parsing text is both extraordinarily brittle for automation purposes, and quite tedious for manual work.
So, to make realistic use of these tools, there's lots of flags to force the output to be as structured as possible for various automation tasks, to make parsing the output tractable (and to ask for explicit guarantees on the format).
Then, to achieve many common tasks without having to write your own parser and text manipulation in awk or whatever, you need many other flags for transformations which are trivial on the base data types but complex for formatted text, such as sorting files by a last modified (which is `ls -lt` with flags, and a terrible amount of work that I'm not going to go into using Unix pipes - at least if you want to support such complex things like file names including whitespace, or dates in the current locale).
AFAIK, originally the reason why they were made as programs was that the available programming languages were too cumbersome for such use. Now we have plenty of experience making lightweight scripting languages that are pleasant to use in a live environment (vs premade script), so why give up the flexibility of ad-hoc scripting?