The Rust Implementation of GNU Coreutils Is Becoming Remarkably Robust

unixgoddess · on Feb 10, 2023

I wonder why not reimplement coreutils as library functions, to be used within an ad-hoc REPL. It would be much more flexible and extensible.

AFAIK, originally the reason why they were made as programs was that the available programming languages were too cumbersome for such use. Now we have plenty of experience making lightweight scripting languages that are pleasant to use in a live environment (vs premade script), so why give up the flexibility of ad-hoc scripting?

pdpi · on Feb 10, 2023

The goal here is to provide a safer implementation of the core tools you need for a Unix system. The nature of these things is set in stone.

You’re describing an interesting project, but it’s not _this_ project.

unixgoddess · on Feb 10, 2023

what if it were written as a library, with the traditional cli implementation as a thin layer over it?

I'm thinking about the wider FOSS ecosystem, for example if Firefox was built as a gui gluing together a modular collection of libraries people could do all sort of cool things with them.

Monolithic applications make sense for proprietary software, not so much for FOSS.

kerkeslager · on Feb 10, 2023

You're proposing abandoning the Unix/Posix standards and philosophy in favor of an untested strategy.

"Move fast and break things" makes sense for acquiring investors, not so much for core infrastructure that everyone everywhere depends on.

This could work, but you'd need at least a decade of widespread usage to work out all the problems before it would even be worth considering for core infrastructure tools.

Zuiii · on Feb 11, 2023

Why would that be relevant if the thin wrapper is fully compliant with the POSIX/GNU standards?

Busybox with it's single binary, depend-on-arg[0]-hack implantation was at one point an "untested strategy", yet look at it now. Rust's Coreutils need to offer a real uvp if they want to see real adoption. Providing coreutil as a library and not forking processes would certainly qualify as that.

bentley · on Feb 10, 2023

How stable would this interface be?

If stability is a concern, exposing a greater surface for user interaction would surely slow down development as these interfaces would have to be reworked with care.

If stability is not a concern, then any user tools built upon these interfaces would be subject to breakage at the rate of upstream development. That’s got to be frustrating.

One of the nice things about the POSIX command‐line interface is that the build systems that interact with them know what to expect, because the interface has been much the same for a very long time, while still providing hugely useful capability.

unixgoddess · on Feb 10, 2023

As stable as, say, golang's standard library. Sure, it needs upfront thinking and commitment, but it's not that difficult and might be well worth it.

In the case of coreutils, the problem space is fairly simple and well-understood, so it should be quite easy to commit to a stable interface. Even for something exceptionally complex like a web browser, I'd expect most components to be easily kept backwards-compatible in terms of public api.

kerkeslager · on Feb 10, 2023

> As stable as, say, golang's standard library. Sure, it needs upfront thinking and commitment, but it's not that difficult and might be well worth it.

That's actually far less stable than is needed for core utils.

carapace · on Feb 10, 2023

> what if it were written as a library, with the traditional cli implementation as a thin layer over it?

That's kind of the way it is. Most of the core utils are thin wrappers around C libraries.

- - - -

It sounds like you're thinking of things like the Oberon OS, where there were no separate applications, instead the system was extended by adding new commands to a unitary GUI. Or the Canon Cat.

seanhunter · on Feb 10, 2023

This particular implementation uses various libraries (crates in rust) already which basically do the sort of thing you're looking for.

i.e. This is the thing you want it's just not one library it's a collection of libraries and a collection clis which use those libraries.

ahepp · on Feb 10, 2023

Isn't what you're describing basically a unix "shell" like bash, csh, zsh, etc?

I agree that something like bash leaves a lot to be desired in REPL functionality. Its ubiquity is convenient, however.

westurner · on Feb 10, 2023

BusyBox has Ash sh and a number of other binaries all compiled into a multiply-symlinked executable.

BusyBox and Ash (and Bash) in Rust would be neat. IDK that docstring parity would be a good thing?

There's also RustPython.

lelanthran · on Feb 10, 2023

> I wonder why not reimplement coreutils as library functions, to be used within an ad-hoc REPL. It would be much more flexible and extensible.

Doesn't busybox do something similar - (almost) everything is in a single binary.

unixgoddess · on Feb 10, 2023

nope. it's still a traditional binary meant to be used as traditional binaries from the posix shell. What I mean is, replace both the binaries and the shell with a library equivalent of coreutils running from a REPL.

ricardobeat · on Feb 10, 2023

What's the difference then? Especially if your REPL uses a shell-inspired scripting language.

LeonidasXIV · on Feb 10, 2023

The difference would be that you wouldn't fork processes for `cat` etc, instead treat them as builtins (like `echo` or `which`).

dmurray · on Feb 10, 2023

What's the advantage? Forking processes is more heavyweight than a typical function call, but that's hardly a concern if the use case is a REPL.

lelanthran · on Feb 10, 2023

> What's the advantage? Forking processes is more heavyweight than a typical function call, but that's hardly a concern if the use case is a REPL.

I was wondering the same thing, to be honest - if you're in a REPL, does it matter that it takes 200ms to "call" a function than if it takes 20ms?

jgerrish · on Feb 10, 2023

Sometimes, hitting those roadblocks leads to a better solution.

Maybe the new model is slower, and somebody looks into it, and realizes if they add a caching layer between the "REPL" module and the kernel ioctl, or service orwhatever, it will speed things up.

I run find and grep lot. And I'm sure the kernel caches a lot of the FS stuff, but there are higher-level things that could be cached and shared with other "REPL" modules. Like predictive URL middleware in browsers. Pluggable middleware that can be enabled or disabled.

Available now on the OS module store:

Larry's Grep Count Document Prefetch Module. Certified Safe by BlahCorp.

This isn't a new idea, and I'm sure others have had it before me.

jgerrish · on Feb 10, 2023

That's actually pretty cool.

My post was a bit cynical, but the network effects would make it. The community would make it.

If it ever happened, I hope the contributors have fun.

unixgoddess · on Feb 10, 2023

it's not just about forking processes. Instead than a single binary that needs to satisfy as much use cases as possible while remaining small and general, you would have a lot of more atomic functions that users can mix and swap as needed case-by-case.

lelanthran · on Feb 10, 2023

> Instead than a single binary that needs to satisfy as much use cases as possible while remaining small and general, you would have a lot of more atomic functions that users can mix and swap as needed case-by-case.

Maybe I'm missing something here (it's been a long time since I last looked at the busybox code), but isn't busybox a single file that has a lot of atomic functions that callers can mix and swap as needed, using the shell as a REPL?

IIRC, and please correct me if I am wrong), all those little functions in busybox are simply single functions. There's a `cat` function, and a `head` function, and a `cp` function, etc.

I don't see what can be gained by moving them into a library file, and using the shell to call those functions, instead of leaving them in the shell program and calling them.

chrisjc · on Feb 10, 2023

I use Linux with ignorance.

But how do you know what is part of the shell vs whatever `cat` is (system/kernel function)?

macOS (prob diff from Linux obviously since based on BSD):

    $which echo
    /bin/echo
    $ which cat
    /bin/cat
    $ which which
    /usr/bin/which

It's these kinds of threads that I learn so much.

jasomill · on Feb 10, 2023

which is not a bash builtin (on Mac or Linux); use type instead:

   $ type echo
   echo is a shell builtin
   $ type cat
   cat is /bin/cat
   $ type which
   which is /usr/bin/which
   $ alias a=true
   $ type a
   a is aliased to `true'
   $ function f { true; }
   $ type f
   f is a function
   f () 
   { 
       true
   }

Incidentally, zsh, the current default Mac shell, has both type and which as internal commands, with different output:

    % which echo
    echo: shell built-in command
    % type echo
    echo is a shell builtin    
    % which cat
    /bin/cat
    % type cat
    cat is /bin/cat
    % which which
    which: shell built-in command
    % type which
    which is a shell builtin
    % alias a=true
    % which a
    a: aliased to true
    % type a
    a is an alias for true
    % function f { true; }
    % which f
    f () {
     true
    }
    % type f
    f is a shell function

Note that, on zsh, the "native" command is actually whence; which and type are equivalent to "whence -c" and "whence -v", where

    % man -W zshbuiltins \
      | xargs groff -Tutf8 -mandoc -P -cbdu \
      | awk '
          /^       [^ ]/ { out = 0 }
          /^       whence / { out = 1 }
          { if (out) print }
        '
           whence [ -vcwfpamsS ] [ -x num ] name ...
                  For each name, indicate how it would be interpreted if used as a
                  command name.
    
                  If name is not an alias,  built-in  command,  external  command,
                  shell  function,  hashed  command,  or a reserved word, the exit
                  status shall be non-zero, and -- if -v, -c, or -w was passed  --
                  a  message will be written to standard output.  (This is differ‐
                  ent from other shells that write that message  to  standard  er‐
                  ror.)
    
                  whence  is most useful when name is only the last path component
                  of a command, i.e. does not include a `/'; in  particular,  pat‐
                  tern  matching only succeeds if just the non-directory component
                  of the command is passed.
    
                  -v     Produce a more verbose report.
    
                  -c     Print the results  in  a  csh-like  format.   This  takes
                         precedence over -v.
    
                  -w     For  each  name,  print `name: word' where word is one of
                         alias, builtin, command, function,  hashed,  reserved  or
                         none,  according  as  name  corresponds  to  an  alias, a
                         built-in command, an external command, a shell  function,
                         a command defined with the hash builtin, a reserved word,
                         or is not recognised.  This takes precedence over -v  and
                         -c.
    
                  -f     Causes  the contents of a shell function to be displayed,
                         which would otherwise not happen unless the -c flag  were
                         used.
    
                  -p     Do  a  path  search  for name even if it is an alias, re‐
                         served word, shell function or builtin.
    
                  -a     Do a search for all occurrences of  name  throughout  the
                         command  path.   Normally  only  the  first occurrence is
                         printed.
    
                  -m     The arguments are taken as patterns  (pattern  characters
                         should  be  quoted), and the information is displayed for
                         each command matching one of these patterns.
    
                  -s     If a pathname contains symlinks, print  the  symlink-free
                         pathname as well.
    
                  -S     As  -s, but if the pathname had to be resolved by follow‐
                         ing  multiple  symlinks,  the  intermediate   steps   are
                         printed, too.  The symlink resolved at each step might be
                         anywhere in the path.
    
                  -x num Expand tabs when outputting shell functions using the  -c
                         option.  This has the same effect as the -x option to the
                         functions builtin.

Finally, note that the bash type command also has many options,

    $ info bash -n 'Bash Builtins' \
    >   | awk "
    >       /^'/ { out = 0 }
    >       /^'type'/ { out = 1 }
    >       { if (out) print }
    >     "
    'type'
              type [-afptP] [NAME ...]
    
         For each NAME, indicate how it would be interpreted if used as a
         command name.
    
         If the '-t' option is used, 'type' prints a single word which is
         one of 'alias', 'function', 'builtin', 'file' or 'keyword', if NAME
         is an alias, shell function, shell builtin, disk file, or shell
         reserved word, respectively.  If the NAME is not found, then
         nothing is printed, and 'type' returns a failure status.
    
         If the '-p' option is used, 'type' either returns the name of the
         disk file that would be executed, or nothing if '-t' would not
         return 'file'.
    
         The '-P' option forces a path search for each NAME, even if '-t'
         would not return 'file'.
    
         If a command is hashed, '-p' and '-P' print the hashed value, which
         is not necessarily the file that appears first in '$PATH'.
    
         If the '-a' option is used, 'type' returns all of the places that
         contain an executable named FILE.  This includes aliases and
         functions, if and only if the '-p' option is not also used.
    
         If the '-f' option is used, 'type' does not attempt to find shell
         functions, as with the 'command' builtin.
    
         The return status is zero if all of the NAMEs are found, non-zero
         if any are not found.

deadly_syn · on Feb 10, 2023

Echo is a builtin in bash.

https://www.gnu.org/software/bash/manual/html_node/Bash-Buil...

kasabali · on Feb 10, 2023

Busybox can do that: https://unix.stackexchange.com/a/274322

nibbleshifter · on Feb 10, 2023

I wish I'd known this a few years ago, when I wrote a fucking disgusting wrapper for busybox to do basically the same thing.

Maybe I'll revisit that project.

theamk · on Feb 10, 2023

because most of the coreutil functionality is already availible in libraries of most languages. Article mentions that there are crates for the logic. The hard part is command line parsing and output formatting, and your library should have neither of those.

I've seen plenty of shell scripts rewritten in Python because they grew too big, and most of the time coreutil commands just get replaced with standard library calls. There are exceptions (like sorting files which do not fit in memory) but otherwise standard library is good enough

matheusmoreira · on Feb 10, 2023

The problem is POSIX. It says operating systems must have mv, cp and all that stuff. This is the reason why people say Linux is not an operating system.

> I wonder why not reimplement coreutils as library functions, to be used within an ad-hoc REPL.

Funny you mention that. I've been working privately on such a "systems programming REPL" in my free time. Basically a freestanding Lisp with pointers and built-in Linux system calls. It's been a huge challenge trying to bootstrap and get the garbage collector working without any libc support, still haven't cracked it.

Languages like Python and Ruby already have system call capabilities. You can literally do anything with those calls. So this already exists in some form, albeit not in the extreme form I envisioned.

chrisjc · on Feb 10, 2023

> I've been working privately on such a "systems programming REPL" in my free time. Basically a freestanding Lisp with pointers and built-in Linux system calls.

Are you building something similar to babashka? Would you be able to figure out what they did with babashka to figure out what you've been unable to do, or are you challenging yourself?

https://github.com/babashka/babashka

matheusmoreira · on Feb 10, 2023

Thanks, that's a nice project I didn't know about! Always happy to see more projects along these lines!! I'm not sure to what extent it permits systems programming though. I searched the repository for common system calls like mmap and didn't find anything. I assume it links to either libc or JVM.

I suppose I'm challenging myself. What I had in mind is much lower level: a Lisp thing where I can use the Linux system calls directly. It's gonna look like this:

  ; mmap some memory
  (set memory (mmap 0 4096 '(read write) '(private anonymous) -1 0))

  ; query the kernel for some data
  ; terminal size for example
  ; have the kernel put the data at the start of that memory
  (ioctl 1 'TIOCGWINSZ memory)

  ; memory now points to a struct winsize
  ; decode the 4 unsigned shorts
  ; first two unsigned shorts are the terminal's rows and columns

The language runtime is completely freestanding: it doesn't link to any library at all, not even libc. I made it so eval supports a special system-call function which executes a Linux system call from C, and I want to build literally everything else on top of that. I want to be able to run strace on any coreutils binaries, see what system calls they make and then implement the same thing on top of the system-call primitive. It should be possible to make a coreutils module that contains an mv function, for example.

  ; boils down to:
  ; (renameat2 'fd-cwd "file" 'fd-cwd "renamed" 'no-replace)
  (mv "file" "renamed")

I had to use static allocation to pre-allocate a stack of Lisp cells when the process is loaded just to get it to evaluate at all. Now I'm trying to get the garbage collector to work so I can get it to bootstrap to a point where it can allocate memory, read files and load more code. I wish I had something real to show for all this effort but right now it's not real yet.

kazinator · on Feb 11, 2023

TXR Lisp:

  (defvarl TIOCGWINSZ #x5413)

  (typedef winsize (struct winsize
                     (ws-row ushort)
                     (ws-col ushort)
                     (ws-xpixel ushort)
                     (ws-ypixel ushort)))

  (with-dyn-lib nil
    (deffi ioctl-winsz "ioctl" int (int ulong : (ptr-out winsize))))

Then:

  $ txr -i winsz.tl 
  TXR doesn't really whip the llama's ass so much as the lambda's.
  1> (let ((ws (new winsize)))
       (ioctl-winsz 0 TIOCGWINSZ ws)
       ws)
  #S(winsize ws-row 37 ws-col 80 ws-xpixel 0 ws-ypixel 0)

matheusmoreira · on Feb 11, 2023

That's amazing! I'll definifely look at it for inspiration.

Am I correct in assuming that it depends on the C library for its system call support? The code seems to be a Lisp equivalent of:

  self  = dlopen(NULL, flags);
  ioctl = dlsym(self, "ioctl"):

I'd like to have system call support as a feature built right into eval. Maybe a JIT compiler that emits code with Linux system call calling conventions whenever eval encounters a (system-call ...) form.

Do you know more low level Lisp systems?

_19qg · on Feb 11, 2023

Basically all/most Common Lisp implementations have a foreign function interface. Those who run on some UNIX or Linux need support for low level access.

See for example the SBCL sources for mmap and ioctl.

cleanchit · on Feb 10, 2023

Inertia

synergy20 · on Feb 10, 2023

the coreutil rust flavor is 16MB per its release build. after strip it cuts down to 10MB, that's the smallest size you can have.

Comparing to c version coreutils, which totals up to 5.8MB, rust does have a slight size "problem", its size is 70% larger even with the busybox-all-in-one style.

you're actually forced to do it the busybox style, otherwise each single small utility of coreutils will be about similar size, say, 6MB for each one, then it will just blow up really fast.

I commented somewhere else, the rust stdlib by default(AND by design) is statically linked, which is totally different from the shared stdlibs like C and C++ etc, that will leads to large size when you have a few of rust release binaries. I have never figured out why Rust can not do the shared-stdlib -just-like-c-c++ yet.

sgerenser · on Feb 10, 2023

Isn’t the problem Rust not having a stable ABI (at least that’s what I heard)? Unstable ABI doesn’t work well with dynamic linking.

synergy20 · on Feb 10, 2023

is the 'unstable abi' by design too? or is it just still evolving? why can't Rust have a stable ABI like libc and libstdc++? this is the major reason I did not use Rust so far indeed, and I don't know Rust enough why it is the way it is as far as stdlib-static-link goes.

dagmx · on Feb 10, 2023

It’s partly by design and partly because it’s evolving.

the “by design” part is because Rust allows for packing structs in an order independent way. This is a nice optimization but it’s not something that is currently stable between versions.

thayne · on Feb 10, 2023

More generally, stabilizing the ABI can prevent future optimization.

Also, including lifetime information in the ABI is a hard problem thay hasn't been solved yet, and you can't have generics in an exposed ABI.

harry8 · on Feb 10, 2023

How does this work wen you want to add a flag?

Eg once upon a time I thought it would be fun to add a flag to ls to limit its results to a certain kind of file so you could list only directories for example. It came up for me as something I needed so I did it. Somebody on the gnu mailing list for core utils rejected it on the basis of ls "having a high bar" for a new flag. $ man ls suggests this hasn't been a consistent policy. It wasn't clear to me if that person was in charge or it was just a vague notion of theirs or anything useful so obviously I dropped it because I'd have something different from someone if there was any interest.

The Rust implementation might come to precisely the same conclusion that it isn't worthwhile. But they also might not on some case if not that one.

Do they do what's better or do they do what's Gnu always and everywhere?

Do they wait until they have significant traction and only then consider such things?

Interesting questions for them to ponder and maybe they have?

terts · on Feb 10, 2023

One of the maintainers of uutils here. We have a few flags that are not in GNU for one reason or another. Some were rejected by GNU, others come from other coreutils implementations like FreeBSD. We document those at https://uutils.github.io/user/extensions.html

We tend to do this sparingly, however, because even just adding new flags might break existing scripts that use abbreviated long options. For example, if the flag you propose is called `--filter` it might break scripts that use `--fi` as a shorthand for `--file-type`, because the prefix is now ambiguous.

I personally like the flag you propose!

mcqueenjordan · on Feb 10, 2023

Oh wow, I had no idea that was a feature. That significantly hampers extending the tools in the future. Has there ever been a discussion to consider breaking that functionality? (I understand it could be a huge impact & not worth it, but just curious to read the discussion if it exists.)

terts · on Feb 10, 2023

I can't find any discussion. But I know that some alternative implementations disable this feature. The cause of this behaviour is actually the glibc `getopt_long` function, which does this automatically, so it can't be changed until it's changed in glibc, which has been rejected at least once that I can find: https://sourceware.org/bugzilla/show_bug.cgi?id=6863

ricardobeat · on Feb 10, 2023

That's the first time I hear of this "feature" (using `--fi` instead of `--file`). I tried a few commands in my shell and none actually support it. How common is this?

terts · on Feb 10, 2023

Outside of the coreutils, not much I think. The GNU coreutils all have it though. It's documented here: https://www.gnu.org/software/coreutils/manual/html_node/Comm...

n8henrie · on Feb 12, 2023

Wow, can't believe I didn't know this. Thanks!

pas · on Feb 10, 2023

How about introducing three dahses for these custom parameters/flags?

Regarding filtering. In a busybox situation depending on some regex crate is a given anyway, but if we are talking about separate binaries adding it to `ls` makes it bigger for everyone, even for those who will never use this feature. Does piping the results to grep make things so much slower that adding filtering to ls is "worth it"? Is this pushing the "philosophy" of do one thing well and be compostable too much? How many kilobytes are we talking about anyway?

Care to share your opinion on these theoretical/pragmatic questions? Thanks!

terts · on Feb 10, 2023

It's possible I suppose, but three dashes already sometimes appears in GNU for hidden options and, probably more importantly, I think it would be frustrating to have to remember whether it was `--filter` or `---filter` for all long flags.

zaarn · on Feb 10, 2023

Maybe uutils could have a build feature that specifically turns off the prefix matching and will break stuff but allows using newer and more useful flags in exchange. I've VERY rarely seen prefix-matched flags being used so I'd wager a distro could be fine deploying it that way.

Ie, setup the features "gnu-compatible-opt-matching" and ship it by default, then gate the extra features behind not turning on that feature.

terts · on Feb 10, 2023

It's a good idea to make the prefix matching optional. I think it might be confusing to gate other features behind it though. I guess we'll get to this once we find flags that are important enough. So far, we haven't really had significant issues with this; compatibility remains our primary focus for now.

terts · on Feb 10, 2023

> In a busybox situation depending on some regex crate is a given anyway

uutils can behave as a busybox-like binary. But I think there's some confusion over the requested feature, because that can't really be done with regex, but you have to inspect the file metadata to check the type of file. That's also why a grep solution doesn't really work, unless you use `ls --classify` and then use the indicators to filter in `grep`.

> if we are talking about separate binaries adding it to `ls` makes it bigger for everyone, even for those who will never use this feature

It's generally not these kinds of features that increase the binary size, but if it does we could also introduce feature flags for it, where you can choose at compile time whether you want the uutils extensions or not. It still adds a bit of a maintenance burden of course.

efreak · on Feb 13, 2023

A solution might be too only allow traditional flags to be shorthand, and require the full long option for new flags.

emmelaich · on Feb 10, 2023

FWIW, this particular behaviour can be achieved with `find . -type d -maxdepth 1 ....`

nishs · on Feb 10, 2023

from the slides:

> Lots of crates (Rust libraries) available - Don't have to reimplement the wheel:

> lscolors, walkdir, tempfile, terminal_size

i believe this isn't always quite an advantage that the slides make it out to be when implementing tools as critical as coreutils. you typically would want internal packages that you can precisely control.

rocqua · on Feb 10, 2023

Version pinning and manual verification of correct implementation can work quite well here.

nishs · on Feb 10, 2023

wholeheartedly yes; this addresses nearly all the concerns i had in my comment.

the only other concern is the ability and the added time to make changes to these dependencies. what this sometimes means in practice:

in terms of time: you may have to wait for upstream to accept your change. alternatively, one could maintain a fork of the package and replace the dependency to point to the fork while waiting for changes to be accepted, however doing so adds back-and-forth work.

in terms of ability: upstream may reject a change.

after the change is merged upstream, you are required to vet commits in the dependency between the last previously vetted commit and your currently merged commit, all at once, before you can upgrade the dependency in your original project.

raverbashing · on Feb 10, 2023

walkdir and tempfile should be simple enough for what they need to do and if it merits a change then that's totally worth taking upstream

Rust should definitely have a good "stdlib" or at least an extended base library to be used for stuff like this.

schemescape · on Feb 11, 2023

How many transitive dependencies result? And what licenses are those crates released under?

Edit: I feel like this is often overlooked, but most licenses require including a copy with binary distributions, and wrangling all those text files can be surprisingly cumbersome. Omitting a license can lead to headaches down the road.

sophacles · on Feb 10, 2023

SQueeeeeL · on Feb 10, 2023

Not the original poster, but coreutils are almost given similar design requirements as airplane software. They need to be fast and they need to be perfect. Having a dependency tree which invokes items that (potentially) don't place as much emphasis on being error free can lead to buggy software.

Remember, there is no reason for anyone to choose the rust implementation, other than philosophy. They need to match or surpass the C implementations with an extreme degree of consistency to be worth the risk of transferring.

sophacles · on Feb 10, 2023

So they vetted a library and chose a version. That version is the version they will use until they choose another version - that's how rust works. It's not npm, you specify a specific version and that's the code that's used, no matter how upstream changes things. It's not C where you load some random .so with the right name and hope that it's compatible (or have entire giant systems built around managing library compatabilites ala Linux distros).

I've got code in prod that uses pre-async versions of tokio and it still builds and runs just fine with the latest rust nightly. If there does turn out to be a problem with the version of some library I chose, and upstream has become incompatible, nothing stops me from vendoring the upstream and fixing the problem my own way. Until then, cargo/crates/rust guarantee that the code I vetted and chose is the code I'll build with. Why is it so vital if the sequence of bytes is stored one place or another?

MrJohz · on Feb 10, 2023

> It's not npm, you specify a specific version and that's the code that's used, no matter how upstream changes things.

Note that that _is_ how NPM behaves. (At least, when using a lock file, which is the default behaviour like with Cargo. If you don't use lock files then neither Cargo not NPM can guarantee this property.)

nishs · on Feb 12, 2023

vetting, and dependency locking, addresses nearly all the issues. there's one other issue: the ability to easily adjust the code as you see fit, so it is vital that you are able to precisely control the code.

related reply: https://news.ycombinator.com/item?id=34740666

Cthulhu_ · on Feb 10, 2023

> Not the original poster, but coreutils are almost given similar design requirements as airplane software.

I mean, a lot probably are used in literal airplane software.

It also has to be good from as early as possible; some software still in use uses decades old code and libraries which cannot easily be replaced (think also embedded software).

hulitu · on Feb 10, 2023

> The Rust Implementation of GNU Coreutils Is Becoming Remarkably Robust

And performance ?

snvzz · on Feb 10, 2023

There's some benchmarks in the talk.

It is significantly faster than coreutils in some situations.

They also know cases where it is slower, and thus I doubt it'll remain so.

KRAKRISMOTT · on Feb 10, 2023

Code is cleaner too, fewer hacks than typical Coreutils code. You get the benefits of BSD style clean code while still having high performance because there are clear functional boundaries and specific features are compartmentalized in modules and dependencies. Systems programming has finally caught up to 30 years of advances in software engineering.

Long live the Rust Evangelism Strike force. Finally Apache/Linux is now possible.

yjftsjthsd-h · on Feb 10, 2023

> Code is cleaner too, fewer hacks than typical Coreutils code

I was under the impression that GNU Coreutils is weird/hacky in implementation because of 3 reasons (in no particular order): 1. Performance, 2. Portability, 3. To make it blatantly obvious that it's not copied from proprietary UNIX™ code. The last point is... unfortunate at this point in time, if unavoidable given the history. For performance, I'll be interested to see how it goes; Rust should help things compose better and lend itself to saner structuring, but the harder you lean into performance the more unavoidable complexity you incur, so it'll be interesting to find out how well new implementations can do. And portability, I suspect, will be the dump stat of any new implementation. Sure, it'll work on Linux, on x86 and ARM. Beyond that... I mean, last time I looked Rust didn't even support as many CPU architectures as Linux, let alone as many OSs as GNU does. I'm not sure if that's a problem or not; so far the precedent seems to be to just break anything unusual and not care (looking at you, Python cryptography), which is a win for the masses and unfortunate for everything else.

carlmr · on Feb 10, 2023

>the harder you lean into performance the more unavoidable complexity you incur, so it'll be interesting to find out how well new implementations can do

While true, especially with string heavy processing, using the borrow checker with CoW structures to cleanly save allocations does more for boosting performance than some hacky C++ code where you never know when it breaks because there are no guarantees on mutability and lifetime of variables.

riffraff · on Feb 10, 2023

Why Apache/Linux? Is uutils part of the Apache foundation?

allendoerfer · on Feb 10, 2023

OP is referring to the license.

riffraff · on Feb 10, 2023

But that's MIT not Apache

earthling8118 · on Feb 10, 2023

It's extremely common for Rust projects to be licensed with both of them

riffraff · on Feb 10, 2023

makes sense, thanks.

Gigachad · on Feb 10, 2023

I've seen that rust based tools seem to be significantly faster on average than their traditional C versions. Seems to be due to using newer algorithms that make use of modern CPU features.

masklinn · on Feb 10, 2023

It also helps that Rust is easy to parallelise. When making some sort of FS discovery software concurrent consists of adding a dependency on rayon and a par_iter() to the main loop it gets easier to take advantage of manycore systems.

Though then you can get the issue that systems don’t really tell you about P/E cores or how they’d want you to use them, and that’s annoying.

pjmlp · on Feb 10, 2023

I remember previous attempts to do this in Ada.

stevekemp · on Feb 10, 2023

I remember in 1999 there was a project to reimplement a bunch of these tools in perl:

https://perlpowertools.com/

I even contributed a little, back then. I guess writing basic versions of "ls", for example, is trivial. But there's a lot of work getting all the tools done, with all the flags implemented and behaving as expected.

I guess there are tools like busybox, toybox, and similar, which also implement a lot of "stuff" to varying degrees of completion. From my side the biggest takeaway from those projects is the sheer convenience of deploying a single binary and installing symlinks to change functionality.

I replicated something similar with my sysbox project, collecting tools together in one golang binary with various subcommands:

https://github.com/skx/sysbox

I use at least one of those tools on a daily basis, though I suspect they're not so universally useful.

anthk · on Feb 10, 2023

Perl was already reimplementing sed/awk and sh in a weird C-ish way, so doing unix utilities would be a piece of cake in Perl.

timbit42 · on Feb 12, 2023

I searched for 'implement coreutils ada' but nothing came up.

What happened?

AndrewDavis · on Feb 10, 2023

I'm now imagining a distro using relibc (rust implementation of POSIX libc from RedoxOS) and uutils.

shaunsingh0207 · on Feb 11, 2023

I currently build my nixOS packages with musl, clang, and uutils, and the difference from gcc/coreutils/glibc is unnoticeable. The uutils project is great

rlpb · on Feb 10, 2023

How does Rust work with dynamic linking? I thought it didn't?

aldonius · on Feb 10, 2023

Rust can dynamically link to/like a library written in C if you really want it to.

https://doc.rust-lang.org/nomicon/ffi.html

rlpb · on Feb 10, 2023

That's the inverse of my question. If Rust is going to replace things written in C, other stuff is going to want to dynamically link to it.

A statically compiled Rust based replacement for an entire distribution isn't a realistic proposition, unless you fancy downloading a gig or two every time there's a security update and everything has to be rebuilt.

dmitris · on Feb 10, 2023

It is possible to use Rust shared objects in C (and other languages) with the "cdylib" or "staticlib" crate types and when the Rust functions are marked with 'extern "C"'. See https://docs.rust-embedded.org/book/interoperability/rust-wi... and https://doc.rust-lang.org/reference/linkage.html. The YouTube video https://www.youtube.com/watch?v=5zmaLhSAkNE has an example/demo and Jon Gjengset has the "Reverse FFI" section in his "Crust of Rust: Build Scripts and Foreign-Function Interfaces (FFI)" video at 2:02:57 https://www.youtube.com/watch?v=pePqWoTnSmQ&t=7378s. Finally, https://prog.world/building-and-using-dynamic-link-libraries... is yes another article on "Building and using dynamic link libraries in Rust".

tsimionescu · on Feb 10, 2023

Theoretically, you could dynamically link with other Rust code that exposes the standard C ABI. This used to be common for C++ code, when name mangling was different between different compilers and versions - so a C++ library that wanted to be portable had to expose a C ABI, and C++ apps would dynamically link to it by calling that C ABI. Of course, this meant no exceptions, no destructors, no std:: data structures, but such was the price.

shrubble · on Feb 10, 2023

Robust in comparison to what? Is the current coreutils lacking in robustness?

doomrobo · on Feb 10, 2023

It had some CVEs but not many [0]. I think the better argument is that some of the original code is just really hard to read. Click around the repo [1]

[0] https://www.cvedetails.com/vulnerability-list.php?vendor_id=...

[1] https://github.com/coreutils/coreutils/

josephcsible · on Feb 10, 2023

I just wish it was copyleft like the real coreutils instead of being pushover-licensed. Now the corporations are all going to start making proprietary forks of this.

CobrastanJorji · on Feb 10, 2023

This always seemed like a terrible idea to me, from the corporation's perspective. Sure, great, you spent a few thousand dev hours and found a clever way to significantly improve the performance of your mallocs. Awesome, you saved 5% on your hardware costs. Good deal.

But now you've got a private fork to maintain. The guy who figured out the optimization was promoted and switched orgs, so the other guy on that team is in charge. There's not quite enough work for him to work on merging stuff in full time yet, so he'll do other stuff, but within a few years, keeping your private fork up to date with security patches is a full time job, then more than a full time job, and nobody wants to join that guy's effort because there's no glory and no promotions in it. But there's no getting off this ride anymore, and nobody's doing expensive performance testing anymore because the guy who knew how to do it is long gone, so I hope it's still making a difference. Ugh. Just share the fix.

regularfry · on Feb 10, 2023

Or a proprietary fork to not maintain, and now nobody else is doing it either. This works for getting a product out of the door so if your planning horizon is the next three months it looks like a great idea.

rurban · on Feb 11, 2023

Maintaining private forks/patches is certainly not a full-time job with git rerere, it's rather trivial. I do maintain hundreds of forks/patches updated daily fully automatically, and I have to engage in manual fixups for maybe 10hrs a year.

The fixes/extensions are shared, but don't expect them to be looked at. On some projects some PR will take years. And some, like OpenSSL or coreutils outright refuse to add new features, or fix code quality.

camgunz · on Feb 11, 2023

This makes sense from an engineering perspective, but misapprehends the purpose of most corporations, which is to funnel rent and laundered government money (subsidies, zero interest rates, contracts) into executive salaries and shareholder returns. This explains why most companies don't care about their products or the people who make them: they're perfunctory.

jillesvangurp · on Feb 10, 2023

I think that's mostly not a thing. The problem is largely imaginary. Most MIT/Apache/BSD/etc. licensed stuff isn't commonly forked by big corporations. Why would they? It's mostly not that practical to maintain private forks of software. And so what if they do?

These are still OSI endorsed licenses of course. So, bona-fide free and open source software as defined by the Open Source Initiative.

Referring to them as push over licenses is not really that constructive. All sorts of respectable OSS software is licensed with these licenses. I'd go even further and state that if you remove all the OSS software that doesn't have a copyleft license, the whole landscape would get a lot less interesting. That gets rid of most popular libraries, lots of popular software packages, all kinds of critical could infrastructure components, development tooling, etc.

The vast majority of OSS software I depend on is actually Apache 2.0 and MIT licensed. There's a fair bit of GPL v2 and MPL in my life as well. GPLv3 and AGPL are just much more of a fringe thing. I tend to actively avoid depending on any such software. Just too much hassle in terms of endless debates of what is and isn't allowed with such licenses and people getting upset if you actually try to use the software for something not endorsed by them. Mostly the venn diagram of such people and people producing useful things to me is pretty narrow anyway. I know legal departments in big corporations tend to have similar policies.

goodpoint · on Feb 10, 2023

> The problem is largely imaginary.

Ever heard of Android, cloud or SaaS? For example our phones and the cloud services they use are choke-full of FOSS, but we plenty of surveillance and very little freedom. Same for modern cars, TVs and many other things.

There are plenty of user-hostile services that use FOSS internally. Weak licenses are huge enablers for this behavior.

jillesvangurp · on Feb 10, 2023

You make that sound like it's an accident and a scandal instead of the huge success it is. Most corporate software at this point consists of large parts of open source stuff. It's fantastic what businesses you can build with this stuff.

None of that software would have happened if it weren't for big corporations putting their resources behind those things AND pooling their resources by sharing code under an OSI approved license with each other. This is open source working as intended. Millions of developers are creating value and are relying on each other. And yes, some money gets made in the process. That's the whole point of committing that much resources. Most software development simply is not charity. And the willingness of companies to spend resources on software is typically motivated by their ability to use the resulting software.

And people can and of course do fork Android. E.g. Amazon and Huawei, etc. have nice products based on Android that don't include any Google proprietary bits. And there are many other android based and derived things out there. Likewise the various cloud providers have a lot of shared components that they depend on and they also are contributing a lot of code. Many of the smaller ones pretty much just use things like openstack. And they all rely on the same big open source things: Linux, Mysql, Redis, Docker, etc.

This would simply not happen with licenses such as AGPL. Easy to say, because it hasn't happened and shows zero signs of actually starting to happen.

goodpoint · on Feb 10, 2023

> This would simply not happen with licenses such as AGPL

Good!

jen20 · on Feb 10, 2023

GPL (the original coreutils license) does not solve for Cloud or SAAS in any case. It’s not really even clear that AGPL does in something like coreutils.

rocqua · on Feb 10, 2023

But android took linux, which has a copyleft GPL license.

So the argument "use a copyleft license or get your work stolen like android" doesn't really work. You'd need to argue that you need a strong copyleft license. Which is a tougher argument to make, because people dislike those more.

josephcsible · on Feb 10, 2023

The kernel is the one and only piece of Android that the forks of aren't proprietary!

mustache_kimono · on Feb 10, 2023

> Ever heard of Android, cloud or SaaS?

And that's why its extremely imaginary in this context, these are "core utilities", not a distributed database. Their value is in being the same everywhere and installed on every system, not in a SaaS interface. The very basis of uutils itself is that it is the same as the GNU equivalents.

One irony is -- look at the linked article's comments -- I have to imagine some of the people saying "Rust is not ready for X because it doesn't have multiple implementations" are the same people saying "Don't create an alternatively licensed implementation of coreutils". Just completely unprincipled, untethered reasoning.

And don't forget the king of ironies -- GNU coreutils themselves are a reimplementation of proprietary code.

> There are plenty of user-hostile services that use FOSS internally. Weak licenses are huge enablers for this behavior.

This is obviously a red herring. Any bad behavior would be the same re: the GPL and the MIT license in this particular instance.

The answer is always -- do better than your competitor. If the GPL is better, fork the MIT code, and build a better alternative. The problem is -- the people actually contributing chose the MIT license. And that should be the end of the matter. You don't get to have an opinion on someone else's choice of license if you contribute nothing.

This combination of bullying and whining by GNU/FSF advocates is extremely off-putting. I'm a contributor to uutils (but I don't speak for it!) and watching the apoplectic whinging by Redditors and HN commenters just re: this project has completely turned me off the GPL for my other projects. When will people realize that being a completely insane socialist/MAGA/atheist/Christian/FOSS advocate at a dinner party is a turn off?

goodpoint · on Feb 10, 2023

> This combination of bullying and whining

Interesting to hear given how much whining I heard in your and other replies...

> completely insane socialist/MAGA/atheist/Christian/FOSS advocate at a dinner party is a turn off?

Plus you are resorting to insults and yet you claim I'm the unreasonable one.

mustache_kimono · on Feb 10, 2023

Afraid I don't understand how me telling you to "Quit whining and fork the project" is actually the true whining.

> Plus you are resorting to insults and yet you claim I'm the unreasonable one.

I don't think I characterized anyone as unreasonable. Your examples are not analogous to the present situation, but I think your real problem is you're telling other people, very well aware of the Good News of GNU/FSF, something they already know.

I, and others, have taken a very clear eyed looked at the GPL/FSF and said "No, not for me," and I wish you'd understand one of the reasons why is yours and others rampant religiosity. I don't comment under GPL licensed project, "Hey I wish you'd change your license to MIT because [red herring, and bad faith argument...]" because I know it's in incredibly poor taste as a non-contributor. MIT/GPL/AGPL/etc isn't always my cup of tea either but its simply not my choice.

That said -- sure, if you/anyone has a merits based case to make as to why something makes more sense as GPL licensed, I'm all ears. But, as it stands, your arguments are in poor taste.

paride5745 · on Feb 10, 2023

But Android is Linux, which last time I checked it's copyleft.

jillesvangurp · on Feb 10, 2023

It is but it uses GPLv2, which has a few accidental loop holes that they tried to close with v3. The net result of this is that with some defensive strategies, you can pretty much do whatever you want with Linux and get away with such things as bundling proprietary drivers, firmware or running closed source software on it. Which is a reason things like Android can exist.

To copyleft people that's a bug, for most Linux users, that's a key feature of the license. Without that, we'd all be using BSD (like most mac users are) or something else.

snvzz · on Feb 10, 2023

It's their project, and they're free to use whatever license they want.

MIT is a good choice for maximizing adoption, which possibly is their intent.

>Now the corporations are all going to start making proprietary forks of this.

I have a hard time coming up with a scenario where a company would do this. There's literally nothing to gain from doing so.

riffraff · on Feb 10, 2023

> There's literally nothing to gain from doing so.

And yet KHTML was forked into WebKit which was forked into Blink, just to name one.

OSX has forked BSD code, as did windows.

This things happen all the time. MIT is a fine license, so is GPL, but we can't just say "oh nobody is going to fork this, it's dumb!" because it happens all the time.

Someone · on Feb 10, 2023

> And yet KHTML was forked into WebKit which was forked into Blink, just to name one.

Bad example. KHTML was LGPL-licensed when Apple forked it (looking at https://invent.kde.org/frameworks/khtml, it may be dual LPGPLv2/GPLv3 or later licensed now). That probably is _the_ reason WebKit always was open source.

riffraff · on Feb 10, 2023

but I am not arguing that MIT is bad, I like it. I am arguing that "companies don't fork things" is false.

afiori · on Feb 10, 2023

Maybe bad example, but it reinforces the position. Had KHTML been MIT WebKit would have been likely (more) closed source.

_vbnz · on Feb 10, 2023

WINE is the best example, they switched after TransGaming's Cedega just released a commercial fork and contributed nothing back.

snvzz · on Feb 12, 2023

Why do you think this could happen to coreutils?

What could you possibly add to coreutils that would make a proprietary fork sustainable?

dontlaugh · on Feb 10, 2023

KHTML was LGPL. LLVM is Apache 2. Both are contributed to by companies.

The license matters less than you might think.

riffraff · on Feb 10, 2023

sure, but this is not what I am talking about, I am just saying that "there's literally nothing to gain from making forks" does not seem to prevent organizations from forking things.

cleanchit · on Feb 10, 2023

You're talking about browsers. The parent is talking about grep.

raybb · on Feb 10, 2023

Do you know of any place someone is maintaining a list of forks like this?

bayindirh · on Feb 10, 2023

How can you maintain a list of private, closed source forks?

harry8 · on Feb 10, 2023

You maintain it as best you can. Many of the licenses require attribution. Sometimes things are announced and so on. Can it be definitive and complete? Obviously not. Could it be useful even so?

baby · on Feb 10, 2023

So this is great no? I’m happy windows and osx exist

riffraff · on Feb 10, 2023

I am not against licenses that don't require giving back, I am just saying it commonly happens.

dhosek · on Feb 10, 2023

There’s nothing more annoying than GPL absolutists. Who really cares if someone makes a closed-source fork of the code? On the other hand, given that closed-source code is going to exist, isn’t it better if it’s able to use the best code available rather than forcing developers to either ignore the existence of something because it’s GPL’d or just ignore the license in the first place?

There’s a reason I choose MIT over GPL. It’s a freer license.

alwayslikethis · on Feb 10, 2023

GPL protects the software itself against becoming proprietary. Developers are forced to ignore GPL'd code only if they are unwilling to distribute their code under GPL, because the company is unwilling to make a pledge to respect the freedom of the users. Writing more GPL'd code gives advantage to those who are willing to respect the user's freedom.

mustache_kimono · on Feb 10, 2023

> Developers are forced to ignore GPL'd code only if they are unwilling to distribute their code under GPL

If the choice is "you only have one choice", that's not really a choice at all?

I have complete freedom to choose the licenses for my personal projects and I stay away from GPL and its variants (still love copyleft and the MPL2!), because devs are users too, and I (and they) see the GPL as very dev unfriendly.

The GPL treats "users" and "devs" as abstractions, when in a very real sense, the devs are the ones most likely to use the code, and they are more likely to use it when it's more permissively licensed. And many are very pleased/want to use copyleft, if it isn't the GPL, and the FSF's ridiculous interpretations thereof.

xorcist · on Feb 10, 2023

"More" or "less" free is a word definition game that is mostly a chimera. It depends on what the purpose of the software is and which business model is the most fit.

Copyleft software has several vendors compete around the same product. Every economic actor has to compete to add business value and consulting, with the knowledge that any extension to the software is made available to everyone else.

Free and non-copyleft software leaves vendors free to compete with different products built from a common base. Extensions to the software has commercial value and every vendor seeks their local maxima.

So the situations are different. GPL and BSD software do not often compete in the same space, with a few exceptions. There used to be a commercial variants of BSD, but they are all gone, outcompeted by a common Linux platform. FreeBSD was technically superior for a long time but couldn't compete with a multi vendor product in the long run.

Products that represent a common platform shared by several products however are more successful non-copylefted. Formats such as gzip and jpeg are completely dominant due to their multi product usage, and any GPLd codecs that competed with them are mostly forgotten.

pjmlp · on Feb 10, 2023

That is why FreeBSD is everywhere, with nice contributions from all its commercial users, and clang is ISO C++20 compliant thanks to all those nice compiler vendors thinking of upstream.

bsder · on Feb 10, 2023

Linux owes its ubiquity over the BSDs more due to the fact that it didn't have to fend off lawsuits at a critical point rather than due to license choice.

pas · on Feb 10, 2023

That's too few data points to draw any significant conclusions.

It's very possible that the core maintainers personality played a bigger role than the license.

vore · on Feb 10, 2023

But GPLv2-licensed Linux is also everywhere?

pjmlp · on Feb 10, 2023

Moreso than all BSD combined, enjoying their contributions from Apple and Sony, among others.

Also regarding GPL versus MIT compilers, guess whose column wins out in red squares,

https://en.cppreference.com/w/cpp/compiler_support/20

mhandley · on Feb 10, 2023

Minix is BSD licensed and is present in Intel CPUs[0]. So I think it would be a toss-up as to whether BSD or GPL-licenses OSes are more widely used overall. To be clear, I'm making no comment on whether Intel using Minix is a good thing for their customers, but it's likely that the BSD license was what Intel wanted and caused it to become the most widely used OS few people know about.

[0] https://www.zdnet.com/article/minix-intels-hidden-in-chip-op...

pjmlp · on Feb 10, 2023

Indeed, and where are Intel's contributions?

tsimionescu · on Feb 10, 2023

I know that was not your main point at all, but fascinating to see that MSVC is the only compiler to be fully 100% standards compliant (with only GCC being really really close).

vore · on Feb 10, 2023

My bad, your sarcasm translated badly in text ;-)

bentley · on Feb 10, 2023

You’re suggesting the reason Clang isn’t yet 100% C++20 compliant is that it doesn’t use a copyleft license?

pjmlp · on Feb 10, 2023

Yes, LLVM has achieved the same contribution level as the Linux kernel[0], yet where are the contributions from ARM, Intel, IBM, Apple, Google, Green Hills, Codeplay (now Intel), NVidia, Nintendo, Sony, TI.... into clang?

While Apple and Google have switched focus to their own efforts (Swift, Objective-C, Carbon, C++17 being good enough), there are plenty of compiler vendors on that list with forked clang for their proprietary compilers.

[0] - https://www.phoronix.com/news/LLVM-Record-Growth-2021

bentley · on Feb 10, 2023

And which of these proprietary Clang forks have greater C++20 compatibility than free software Clang?

The only proprietary fork of LLVM in the compatibility table you linked is less C++20 compliant than free software Clang.

So unless you have some causative explanation, I think the more sensible possibility is simply that the Clang developers have prioritized working on some of the plentiful other features of a compiler toolchain than perfecting their C++20 standards compliance.

Blindly asserting (or implying) that if LLVM were GPLv3 then it would be more standards‐compliant, with nothing to back it up, doesn’t add much to the discussion, IMO.

pjmlp · on Feb 10, 2023

Who knows certainly? We would need to buy them them all.

Even if not, they are clearly waiting for others to do the needful for free, and then gladly recoup the fruits of others labour while selling their compilers for profit.

forty · on Feb 10, 2023

Consider the Linux kernel. First it would probably have never been successful with a liberal license in the first place, but let's ignore that for a second and assume it got successful anyway.

If you look at the current situation where each SoC supports only one (generally very old and very vulnerable) kernel, I think we have an the reasons to be happy that they at least have to release the sources so hopefully someone can extract the patch and have it work on recent kernel. It's not very hard to imagine the situation where we'd only have the binary kernels if we did not have the GPL.

mschuster91 · on Feb 10, 2023

> If you look at the current situation where each SoC supports only one (generally very old and very vulnerable) kernel, I think we have an the reasons to be happy that they at least have to release the sources so hopefully someone can extract the patch and have it work on recent kernel.

The problem is, many vendors don't release their kernel sources at all - particularly Chinese pseudo-brands based on some knock-off of Mediatek reference designs come to my mind - and those that do release their code often show horrible code quality that would take many man-months of work to clean up enough to be included into the kernel, not to mention the lack of documentation and the tendency for these forks to fix undocumented hardware quirks for specific SoC revisions in kernel code.

And Google, the only entity in town that could push for better behaviour at least in mobile, doesn't do anything to help out on that front as well - there is not a single mention in the CDD [1] (the requirements if you wish to call a device Android compatible and use the Google ecosystem) mentioning licenses outside of a warning that implementers have to take care about software patents for multimedia codecs.

Non-phone/tablet embedded applications are even worse. I admit my knowledge is dated, but I came across a lot of devices over the years where their source code dump had u-boot versions that were half a decade outdated and so many changes done compared to even the version of u-boot that it claimed to be that it was completely infeasible to even attempt and migrate the changes to a modern u-boot version.

And to make it even worse: Modern "device integrity" crap makes the situation completely impossible to resolve - even if you had a decent-quality code dumps not deviating too much from upstream, you can't deploy your code to the actual device in question to test if the device still works because the device doesn't allow flashing unsigned/self-signed images, test points (e.g. JTAG) are fused-off in hardware, and even if memory chips can be flashed with a programmer (which isn't a given, since if you power the flash chip using a clip probe, often you power the rest of the board with it!) the flash chip content itself is signature validated. Oh, and it still can get worse than that given the rise of "secure element" co-processors that can be used to decrypt flash chip content on the fly - you don't even have a chance to read the firmware content without first having a code execution exploit on the device and then achieving code execution in the "secure element". The people able to do this are short in supply and most of them work in jailbreaking gaming consoles, not a 30 dollar Chromecast or similar appliance.

We need laws and regulations against that crap, but politicians don't even have that on their radar - how would they, given that a lot of politicians are fucking gerontocrats and of those that aren't, no one outside the various European Pirate Parties and affiliates has a tech background to even push for such regulation.

[1] https://source.android.com/docs/compatibility/12/android-12-...

0dayz · on Feb 10, 2023

I would rather have transparency and portability (from having access to the source code) than quality code (you would be surprised how many project are written in extremely bad quality).

mschuster91 · on Feb 10, 2023

The thing is, we already have transparency and portability, and yet it is effectively useless because the code is in such crap quality that it is completely infeasible to do any serious work with it, and because bootloader security makes third party development very difficult to outright impossible (not to mention fourth parties like banks, games or DRM punishing you for exercising your rights by cutting off your service).

fractalb · on Feb 10, 2023

> Who really cares if someone makes a closed-source fork of the code?

Weren't Amazon and ElasticSearch on news a few years back about some license changes?

forty · on Feb 10, 2023

My understanding is that Amazon was not even making a closed source fork of ES, they were merely offering managed hosting of the open source version of ES. As Elastic is making money by selling hosting, they did not like the (admittedly unfair) competition from AWS, they decided to stop doing Open Source. And actually now, AWS is maintaining an open source fork of ES...

fractalb · on Feb 10, 2023

> Amazon was not even making a closed source fork of ES, they were merely offering managed hosting of the open source version of ES.

That sounds even worse. Whatever happened, ES didn't gain anything by having an open business friendly license (as it appears from the events).

> As Elastic is making money by selling hosting, they did not like the (admittedly unfair) competition from AWS, they decided to stop doing Open Source.

That's sad. ES went from "business friendly opensource license" to "closed source". Now there's no open-source code at all (from ES). Some restrictive open-source code is better than closed source. Always.

> And actually now, AWS is maintaining an open source fork of ES...

I think, it's not out of generosity. It has customers.

Xylakant · on Feb 10, 2023

Elastic has never been doing open source for generosity - they have customers. They're admittedly much smaller than AWS, but they're venture backed and have money. They're not a small open source project, even though they sometimes portrayed themselves as such.

They did profit from being open source, being based on apache lucene. They profited from the work that the community did for them - I used to be part of the folks running one of the user groups. I was on their IRC channel, helping other folks adopt ES, for free, because it was open source. And they did flip those folks a finger and went and made it closed source because they ended up having a spat between businesses. It's their code, I get it and they get to do whatever they decide. But Elastic is very much not a paragon of open source virtue in shining armor. They, too, have placed business interest before community interest.

fractalb · on Feb 10, 2023

Right. Elastic didn't do it for generosity, either. But your point that ES got benefited from the business friendly license and then made it closed source just supports my point that businesses will just maintain their forked-version rather than putting it out there in the open. I'm all in for open-source software, but I'm just making a point that business-friendly open source licenses don't necessarily beget more open source code. Sometimes a restrictive open source license like (GPL) is far better than a business-friendly license for open source community.

Edit: grammar, spellings

Xylakant · on Feb 10, 2023

GPL would not have had any effect in Elastics case, private forks are legal under the GPL. AGPL might have. But then again, AGPL would have been a major hindrance for corporate adoption since no one wants to open source their whole tech stack just to add search.

The interesting thing is that Elasticsearch has an open source competitor, Apache Solr. The community around Solr is organized more like the community around postgres - multiple actors that work on a shared project that not a single one controls. Anyone of them could make a proprietary fork, but the others could quickly band together and punish that.

So the lesson to draw here is maybe that the license itself matters less, but what matters is whether there's a single actor in control of the project. Because in the end, the reason why Elastic could pull this stunt is less about the license, but about the copyright ownership and control over the project. They owned the code, had a CLA in place for any contribution and thus could do whatever, license be damned.

tsimionescu · on Feb 10, 2023

AGPL or GPL doesn't matter here - AWS were fully compliant with the AGPL in their use of ElasticSearch (they were already releasing all the code, even though they were providing it as a managed service).

Xylakant · on Feb 10, 2023

AGPL would probably matter here - it's viral and would potentially require open sourcing all software in the stack that interacts with it, for example the systems that AWS uses to manage the Elasticsearch instances. In any case, AGPL is on the "do not use" list for many enterprises, regardless whether it would have any effect or not in the specific case. Engineers that want to use AGPL software often would have to go through legal, so it does stifle corporate adoption.

tsimionescu · on Feb 10, 2023

The only difference between the AGPL and the GPL is whether offering access to the software over a network is considered distribution or not by the license itself (AGPL says yes, GPL says no). Otherwise, they are both exactly as viral.

You are quite right to some extent about corporate policy differences. I think the status quo might be trending away from this extreme caution, but it's still there in many corps.

tsimionescu · on Feb 10, 2023

> And they did flip those folks a finger and went and made it closed source because they ended up having a spat between businesses.

I don't think it's fair to say ES is now closed source. It's no longer FOSS nor GPL-compatible, but the source is very much still open, and nothing much has changed for many users of the self-hosted versions (source: shipping a proprietary appliance-like product with ES as a component, with full legal checking that we are in compliance of the license).

Xylakant · on Feb 10, 2023

It is no longer open source, there’s no doubt about it. You may have a free beer license to it as long as elastic grants you one, but you have no right to modify, patch, redistribute as you see fit.

Open source projects that relied on ES are cut off.

tsimionescu · on Feb 10, 2023

You are still free to modify the code, patch or redistribute it under the terms of the SSPL. If you prefer the ESL, you may only modify the code, but not redistribute it.

Now, if you were distributing a GPL product that included ES, you will have significant problems, since the GPL and SSPL are not compatible - so you may in fact be unable to distribute the whole product anymore, which probably caused huge disruptions to some projects - so I'm not in any way saying that what they did is nice. But it was definitely not making it closed source, not in spirit and not in effect.

josephcsible · on Feb 10, 2023

> the source is very much still open

You're mixing up open-source with visible-source or source-available.

tsimionescu · on Feb 10, 2023

Not exactly.

I explicitly noted that they are no longer FOSS, but I beleive its wrong to call them "closed-source".

If parent had said "they are no longer open source", I wouldn't have commented this at all, since "open source" and FOSS are essentially synonyms. But the antonym of FOSS/open source is not "closed source".

tsimionescu · on Feb 10, 2023

I don't remember which license ElasticSearch used to be under back when it was open source, but there are others who faced the same problem even with the most copy left of licenses. MongoDB wen through the exact same problems and changes, even though it started under the AGPL. So exactly what kind of OSS license Elastic had been using is irrelevant - you can't compete with AWS on hosting open source code.

josephcsible · on Feb 10, 2023

This went in the opposite direction you'd expect. The original ElasticSearch is proprietary now, and Amazon's fork of the last FOSS version is itself FOSS.

chrismsimpson · on Feb 10, 2023

Every bit this. How about the freedom to ship something without having an ideological position on when/how it links against other code.

vore · on Feb 10, 2023

The ideological position is the whole point of the GPL, after all. If you're using someone else's labor, it's only fair they're allowed to dictate the terms of how you're allowed to use it.

d12bb · on Feb 10, 2023

> If you're using someone else's labor, it's only fair they're allowed to dictate the terms of how you're allowed to use it.

And they are, that's what choosing a licence is all about. In case of MIT/ISC/BSD/WTFPL licensed software, this someone else just decided to dictate a lot less than he could have.

pjmlp · on Feb 10, 2023

Except many of them then come out complaining how Amazon (just to pick an example) is leeching their work, while they are perfectly in line with the chosen license.

kuschku · on Feb 10, 2023

And that's why whenever I want to contribute to a MIT/ISC/BSD/WTFPL licensed project, instead I just fork it and make my own changes GPLv3 only.

If companies can make proprietary forks, I can make a GPLv3 fork. Best of all, no company can take my changes and make them proprietary. And if the maintainers want to merge my changes, they've got to relicense to GPL.

bentley · on Feb 10, 2023

When you fix a bug in X11, you keep it in your one‐star‐on‐Gitlab personal fork under GPLv3 to prevent it from ever making it into HP‐UX, rather than contribute it to the upstream project, where tens or hundreds of thousands of individuals using X11 under free and open source licensing in free software distributions (not proprietary forks) could have experienced the benefit of your fix?

I mean, you’re free to do that, but taking pride in it sure feels odd to me.

kuschku · on Feb 10, 2023

I don't use X11, but I've had a very negative experience trying to contribute back to MIT/BSD/etc licensed projects anyway, as they're usually just one close team of developers publishing something and rejecting any outside PRs or requests.

License choice is a political choice, and it's not a coincidence that corporate-friendly licenses are usually on projects that have a strict vision and reject outside changes.

I've got the features I want, in the way that I want them. If you want to use them, you're free to switch — for some of my projects, tenthousands of people have done so.

heleninboodler · on Feb 10, 2023

However, it turns out the ideological position is not the whole point of writing open source software, to a lot of people.

vore · on Feb 10, 2023

Then don't use GPL code? If you don't agree with the terms, don't use it.

bentley · on Feb 10, 2023

This whole thread came about from a project writing its own non‐GPL code from scratch, and GPL fans subsequently complaining about it.

tsimionescu · on Feb 10, 2023

Plenty of GPL fans try to convince others to choose the same license, almost always implying that the GPL is the only moral/ethical choice (Stallman himself being the most ardent believer of this stance). They often even insist that the GPLv2 is not actually a moral choice anymore, given the evils of TiVoization.

This whole thread is about complaining/supporting this attitude - not about criticizing those who simply chose the GPL as their license.

pjmlp · on Feb 10, 2023

While some non-copyleft fans cry in public how imoral the corporations are by abusing their work and then try to create anti-big-corp licenses.

bentley · on Feb 10, 2023

What you’re implying is a fault exclusive to “non‐copyleft fans” happens in copyleft‐licensed projects too, like MongoDB, which was AGPL before it decided to switch to an even more restrictive license to counter hosting service companies.

pjmlp · on Feb 10, 2023

Except the copyleft‐licensed projects have the legal means to prosecute, if they so wish to do so.

tsimionescu · on Feb 10, 2023

They do not - AWS was fully compliant with Mongo's AGPL license, and no one disputed that.

tsimionescu · on Feb 10, 2023

Sure, each side has its own misguided preachers. The point is to make fun of all of them, not to say that one side is better than the other.

heleninboodler · on Feb 13, 2023

A: "This is why I don't use GPL" B: "Yes, you shouldn't have to take an ideological position just to publish some code" C: "But the ideological position is the whole point of GPL" D: "but the ideological position is not the whole point of writing code." C: "Then don't use GPL code?" D: ".... exactly?"

goodpoint · on Feb 10, 2023

Choosing to use a strong or a weak license is exactly, equally ideological.

yjftsjthsd-h · on Feb 10, 2023

> How about the freedom to ship something without having an ideological position on when/how it links against other code.

Is that possible? Choosing to publish something to... oh, let's say to release it into the public domain (that seems like the farthest you can get from copyleft) is still an ideological position.

chrismsimpson · on Feb 10, 2023

Firstly, closed source would likely be the antithetical counterpoint to GPL. I refuse to even call it copyleft, because that term is a purity test. I’m a leftist and I BSD/MIT my work so that others may benefit from it. If it’s valuable, they may contribute back. If they don’t, I still haven’t restricted their freedom or choice. An ideological position is usually one that is rigid and uncompromising. GPL/‘copyleft’ is a way of poisoning the commons.

Symbiote · on Feb 10, 2023

If you release something and expect to retain copyright — that is, you wish to prevent me from decompiling it, redistributing it etc — then you are also taking an ideological position.

bschwindHN · on Feb 10, 2023

In practice, what are the worst case scenarios for a corporation making a fork of the "mv" or other commands?

chomp · on Feb 10, 2023

In practice, there's no disaster scenario. The issue is the potential for leaving effort and hard work on the table, because a company could improve upon your code and never release it.

For example, we have Rocky and Alma Linux because so much of RedHat is based on GPL-licensed GNU/Linux. They technically don't have to release any of the MIT components, they do it because they are nice stewards of open source. Same with Ubuntu. And SUSE. Other companies could be not so nice. For instance, note that Google Chrome is not technically open source. Only Chromium is, which Chrome is built from.

In reality, MIT binaries without the source code are rare.

jasone · on Feb 10, 2023

There are often situations in which the rational approach is to upstream enhancements, if only because it reduces ongoing maintenance of a derivative product. This is especially true of foundational infrastructure. BSD/MIT-like licensing works well for such software (perhaps less well in general).

jacooper · on Feb 10, 2023

And feature creep for each jloud providers distro starts happening, till they are now totallt different behaviors and features for each provider.

jacooper · on Feb 10, 2023

Cloud*, totally

m4rtink · on Feb 10, 2023

You mean like most Android user-space right now ? And its billion proprietary forks by Android vendors.

jeff_carr · on Feb 10, 2023

> I just wish it was copyleft

Yes, the GPL is very important. The BSD license _is_ a problem. Those of you that were around when FreeBSD was maybe going to 'win' remember why Linux(GPL) ended up capturing everyone's attention.

You can not count on your code staying free if it's under a BSD license. If there is a way to make money on it, someone will fork it privately and force you to pay for it. If they don't succeed, you will still be under continual threats that you will have to pay for it.

rocqua · on Feb 10, 2023

Why did Linux end up capturing everyone's attention? I wasn't around back then.

mustache_kimono · on Feb 10, 2023

FSF/GPL advocate: "I don't like that someone else is reimplementing this code."

Jesus, you think s/he ever wondered how AT&T felt about Linux and the BSDs literally reimplementing all of Unix?

Oh, that's right. This is the FSF/GPL conundrum -- you're likely to end up on both sides of every issue. You will literally be both a copyright maximalist re Linux and a copyright minimalist re proprietary code, music and film.

mouse_ · on Feb 10, 2023

> pushover-licensed

Huh, never heard the polite form of that term.

INTPenis · on Feb 10, 2023

What's the opposite of a pushover-license? A bully-license? ;)

__d · on Feb 10, 2023

A steadfast-license.

BirAdam · on Feb 14, 2023

Corporations use copyleft stuff all the time too. FOSS subsidizes mega corporations, which sucks. Thing is, it’s also a subsidy for all of us.

goodpoint · on Feb 10, 2023

Same here. Reimplementing something with a weaker license is not a friendly behavior.

mustache_kimono · on Feb 10, 2023

But reimplementing something with a "tougher" license is friendly? I think this is one of those "where you stand depends on where you sit" things.

alwayslikethis · on Feb 10, 2023

That's the main thing I have against the Rust people. They seem to like pushover licenses instead of GPL, perhaps in the hopes of getting hired by one of the big tech corps at some point. If only they realized what made free software so resilient over years..

kzrdude · on Feb 10, 2023

Not sure it's fair to generalize over all "Rust people"

I've worked in the community and it's annoying that it's so against GPL in some corners, but most people are just pragmatic. They use whatever license is the norm for sharing with their community.

Now that Rust is becoming less tight knit it might open up, hopefully, to more diversity in licenses.

alwayslikethis · on Feb 10, 2023

It isn't. I don't intend to generalize to everyone using Rust. But there is a large overlap between Rust evangelists and anti-GPL activists, it seems. I don't want to believe this, but it almost looks like a corporate conspiracy to undermine free software.

nindalf · on Feb 10, 2023

Too much conspiracy theory.

AFAIK there definitely is a preference for Apache/MIT dual licence but this is probably because of the influence of the Rust compiler itself. Most of the initial Rust code was written by people involved with and in service of the Rust compiler. Since that’s licensed MIT/Apache, their code had to be as well if it wanted to be used in the compiler.

There were people who used funky licenses like WTFPL (whatever the fuck you want to) and Unlicense (public domain) but they came around and licensed to MIT/Apache for uniformity with the rest of the ecosystem.

Maybe if the compiler had been GPL the ecosystem might have been as well, but that’s a what if. We’ll never know if a hypothetical GPL Rust would have had the same path to success as the existing Rust.

timeon · on Feb 10, 2023

Slint (Rust gui library) for example is GPL.