> I'm not a go developer and must be misunderstanding something...
I think it's coz not EVERY language's lockfile comes with checksum
So, Go's go.mod is functionally equivalent Ruby Gem lockfile (that doesn't have checksum) but need to get go.sum to be equivalent to npm's (that does come with checksum)
Author just compared it to languages where lockfile means just version lock
Let's assume I publish a github repo with some go code, and tag a particular commit with tag v1.0.0. People start using it and put v1.0.0 into their go.mod file. They use the golang proxy to fetch the code (and that proxy does the "verification", according to your comment). Now I delete the v1.0.0 tag and re-create the tag to point to different (malicious) commit. Will the golang proxy notice? How does it verify that the people that expect the former commit under the v1.0.0 tag will actually get that and not the other (malicious) commit?
its stored forever in the proxy cache and your new tag will never be fetched by users who go through the language's centralized infrastructure (i.e. proxy).
go can also validate the checksums (go.sum) against the languages central infrastructure that associates version->checksums.
i.e. if you cut a release, realize you made a mistake and try to fix it quitely, no user will ever see it if even one user saw the previous version (and that one user is probably you, as you probably fetched it through the proxy to see the mistake)
This is mistaken. The Go module proxy doesn't make any guarantee that it will permanently store the checksum for any given module. From the outside, we would expect that their policy is to only ever delete checksums for modules that haven't been fetched in a long time. But in general, you should not base your security model on the notion that these checksums are stored permanently.
> The Go module proxy doesn't make any guarantee that it will permanently store the checksum for any given module
Incorrect. Checksums are stored forever, in a Merkle Tree, meaning if the proxy were to ever delete a checksum, it would be detected (and yes, people like me are checking - https://sourcespotter.com/sumdb).
Like any code host, the proxy does not guarantee that the code for a module will be available forever, since code may have to be removed for legal reasons.
But you absolutely can rely on the checksum being preserved and thus you can be sure you'll never be given different code for a particular version.
Ah, my mistake. I had read in the FAQ that it does not guarantee that data is stored forever, but overlooked the part about preserving checksums specifically.
To be very pedantic, there are two separate services: The module proxy (proxy.golang.org) serves cached modules and makes no guarantees about how long cache entries are kept. The sum database (sum.golang.org) serves module checksums, which are kept forever in a Merkle tree/transparency log.
Ok. So to answer the question whether the code for v1.0.0 that I downloaded today is the same as I downloaded yesterday (or whether the code that I get is the same as the one my coworker is getting) you basically have to trust Google.
The checksums are published in a transparency log, which uses a Merkle Tree[1] to make the attack you describe detectable. Source Spotter, which is unaffiliated with Google, continuously verifies that the log contains only one checksum per module version.
If Google were to present you with a different view of the Merkle Tree with different checksums in it, they'd have to forever show you, and only you, that view. If they accidentally show someone else that view, or show you the real view, the go command would detect it. This will eventually be strengthened further with witnessing[2], which will ensure that everyone's view of the log is the same. In the meantime, you / your coworker can upload your view of the log (in $GOPATH/pkg/sumdb/sum.golang.org/latest) to Source Spotter and it will tell you if it's consistent with its view:
$ curl --data-binary "@$(go env GOPATH)/pkg/sumdb/sum.golang.org/latest" https://gossip.api.sourcespotter.com/sum.golang.org
consistent: this STH is consistent with other STHs that we've seen from sum.golang.org
For the question “is the data in the checksum database immutable” you can trust people like the parent, who double checks what Google is doing.
For the question “is it the same data that can be downloaded directly from the repos” you can skip the proxy to download dependencies, then do it again with the proxy, and compare.
So I'd say you don't need to trust Google at all in this case.
ok, I guess I was wrong about the cache, but not the checksums. I was somewhat under the impression that it was forever due to the getting rid of vendoring. Getting rid of vendoring (to me) only makes sense if its cached forever (otherwise vendoring has significant value).
Go modules did not get rid of vendoring. You can do 'go mod vendor' and have been able to do so since Go modules were first introduced.
How long the google-run module cache (aka, module proxy or module mirror) at https://proxy.golang.org caches the contents of modules is I think slightly nuanced.
That page includes:
> Whenever possible, the mirror aims to cache content in order to avoid breaking builds for people that depend on your package
But that page also discusses how modules might need to be removed for legal reasons or if a module does not have a known Open Source license:
> proxy.golang.org does not save all modules forever. There are a number of reasons for this, but one reason is if proxy.golang.org is not able to detect a suitable license. In this case, only a temporarily cached copy of the module will be made available, and may become unavailable if it is removed from the original source and becomes outdated.
If interested, there's a good overview of how it all works in one of the older official announcement blog posts (in particular, the "Module Index", "Module Authentication", "Module Mirrors" sections there):
ok, 1) so would it be fair to modify my statement that it basically tries to cache forever unless its can't determine that its legally allowed to cache forever?
2) you're right, glanced at kubernetes (been a long time since I worked on it) and they still have a vendor directory that gets updated regularly.
You are not misunderstanding anything, I use Go and Rust/TypeScript in my daily work and you are correct - it is the OP that does not understand why people use lockfiles in CI (to prevent minor updates and changes in upstream through verifying a hash signature).
They may be an expert in Go, but from their writing they appear to be misunderstanding (or at least misrepresenting) how things work in other languages. See the previous discussion here: https://lobste.rs/s/exv2eq/go_sum_is_not_lockfile
> They may be an expert in Go, but from their writing they appear to be misunderstanding (or at least misrepresenting) how things work in other languages
Thanks for that link.
Based on reading through that whole discussion there just now and my understanding of the different ecosystems, my conclusion is that certainly people there are telling Filippo Valsorda that he is misunderstanding how things work in other languages, but then AFAICT Filippo or others chime in to explain how he is in fact not misunderstanding.
This subthread to me was a seemingly prototypical exchange there:
Someone in that subthread tells Filippo (FiloSottile) that he is misunderstanding cargo behavior, but Filippo then reiterates which behavior he is talking about (add vs. install), Filippo does a simple test to illustrate his point, and some others seem to agree that he is correct in what he originally said.
That said, YMMV, and that overall discussion does certainly seem to have some confusion and people seemingly talking past each other (e.g., some people mixing up "dependents" vs. "dependencies", etc.).
> but then AFAICT Filippo or others chime in to explain how he is in fact not misunderstanding.
I don't get this impression. Rather, as you say, I get the impression that people are talking past each other, a property which also extends to the author, and the overall failure to reach a mutual understanding of terms only contributes to muddying the waters all around. Here's a direct example that's still in the OP:
"The lockfile (e.g. uv.lock, package-lock.json, Cargo.lock) is a relatively recent innovation in some ecosystems, and it lists the actual versions used in the most recent build. It is not really human-readable, and is ignored by dependents, allowing the rapid spread of supply-chain attacks."
At the end there, what the author is talking about has nothing to do with lockfiles specifically, let alone when they are applied or ignored, but rather to do with the difference between minimum-version selection (which Go uses) and max-compatible-version selection.
Here's another one:
"In other ecosystems, package resolution time going down below 1s is celebrated"
This is repeating the mistaken claims that Russ Cox made years ago when he designed Go's current packaging system. Package resolution in e.g. Cargo is almost too fast to measure, even on large dependency trees.
I just recently learned of Meshtastic (https://en.wikipedia.org/wiki/Meshtastic) and MeshCore (https://meshcore.nz/), which provide a platform for private and group messaging over P2P LoRa. They don't depend on internet, rely on the community to provide routing nodes, and thus harder to block for governments. It's gaining steam in Europe and can already be used for messaging across wide distances. It's slow though, so forget streaming videos or images. It can only carry messages. But that's often enough to coordinate or spread news.
In my area there are now just enough Meshtastic nodes that I can (somewhat unreliably) talk between my office and home, about 5 miles.
However, it does heavily relay on the internet for setup and distribution (app stores, or else lots of pip install, git clone, pnpm install, etc.)
I've been working on a virtual machine with all the dependencies preinstalled just so I'll have offline access, and it's surprisingly difficult (though I'm not super familiar with typical webdev stuff). I'd have to think a regular user who really needs to rely on it doesn't stand a chance, which doesn't seem to mesh(ha) that well given how loudly the "off gridness" of it is touted.
Then again, you probably need the internet to be able to obtain the hardware in the first place, but that's another problem.
The bad part is that it cannot create a world wide mesh, as has a low max hop limit (7), and the nodes need lines of sight. So more than 200 km in a mostly flat city is almost imposible.
I wish we had an HF ISM band that could be used for this purposes without needing a license, combined with LoRa radios would yield great results
I repurposed old M1/M4 Mac Mini's at my workplace into GitHub action runners. Works like a charm, and made our workflows simpler and faster. Persisting the working directory between runs was a big performance boost.
I just ordered the BD790i X3D mainboard. A while ago Minisforum has been known for their slow BIOS updates, but hope that they have improved their processes since. I'll see…
We got rid of all Rails apps (that needed a backend). We've moved our Postgres databases to Neon, and run our docker containers on Google Cloud Run (these are containers that don't need to run 24/7, we're paying just a few cents each month, also cold starts are much faster and more reliable than on Heroku).
>> and what did you use to manage git push deployments, setting env vars to replicate the heroku features?
Yes Digital Ocean did all this, they were very feature-close to Heroku. We have over time migrated everything stable/prod to AWS just because AWS has more products and hence you have everything in one place inside a VPC (e.g. vector db)
For Replit, i'd use it for anything I can in early-stages. It helps to prototype ideas you are testing. You can iterate rapidly. For PROD we'd centralize onto AWS given the ecosystem.
> and last q :-) re AWS - once you moved there, did you use something like elasticbean or app runner? or did you roll your own CI/CD/logging/scaling...?
We started with Lambdas because you can split work across people and keep dependencies to a minimum. Once your team gels and your product stabilizes, it is helpful to Dockerize it and go ECS, that is what we did. Some teams in the past used EKS but IMHO it required too much knowledge for the team to maintain, hence we've stuck with ECS.
All CI/CD via Github --> ECS. This is a very standard pipeline and works well locally for development also. ECS does the scaling quite well, and provides a natural path to EKS when you need the scale bigtime.
For logging, if I could choose I'd go Datadog but often you go with whatever the budget solution is.
We're a Ruby shop and we have pretty much zero commented code. Ruby's intended to be readable enough not to need them and when we do need them, it's a sure sign we need some refactoring.
A year ago I was traveling through Uzbekistan while also partly working remotely. IKEv2 VPN was blocked but thankfully I was able to switch to SSL VPN which worked fine. I didn't expect that, everything else (people, culture) in the country seemed quite open.
securityscorecard is easy to integrate (it's a cli tool or you run it as a github action), one of the checks it performs is "Pinned-Dependencies": https://github.com/ossf/scorecard/blob/main/docs/checks.md#p.... Checks that fail generate an security alert under Security -> Code scanning.
> The check works by looking for unpinned dependencies in Dockerfiles, shell scripts, and GitHub workflows which are used during the build and release process of a project.
Does it detect an unpinned (eg a Docker tag) of a pinned dependency.
If go.sum has "no observable effect on builds", you don't know what you're building and go can download and run unverified code.
I'm not a go developer and must be misunderstanding something...
reply