Hi moxie, might I ask if you've considered SHA-384 instead?
If I understand correctly, SHA-256 is part of the SHA-2 family of hash algorithms, and like SHA-1, when used alone it is subject to length extension attacks.
SHA-384 is also a member of the SHA-2 algorithm family, but is immune to length extension attacks because it runs with an internal state size of 512 bits -- by emitting fewer bits than its total internal state, length extensions are ruled out. (Wikipedia has a great table for clarifying all these confusing names and families of hashes: [5].) Other hashes like BLAKE-2 [1], though young, also promise built-in immunity to length-extension attacks. mdm [2] is immune to this because the relevant git datastructures all include either explicit field lengths as a prefix, or are sorted lists will null terminators, both of which diffuse length extension attacks by virtue of breaking their data format if extended.
Not that it's by any means easy to find a SHA-256 collision at present; but should collisions be found in the future, a length extension attack will increase the leverage for using those collisions to produce binaries that slip past this verification. An md5 Collision Demo[3] by Peter Selinger is my favourite site for concretely demonstrating what this looks like (though I think this[4] publication by CITS mentions the relationship to length extension more explicitly).
(I probably don't need to lecture to you of all people about length extensions :) but it's a subject I just recently refreshed myself on, and I wanted to try to leave a decent explanation here for unfamiliar readers.)
--
I'm also curious how you handled management of checksums for transitive dependencies. I recall we talked about this subject in private back in April, and one of the concerns you had with mdm was the challenge of integrating it with existing concepts of "artifacts" from the maven/gradle/etc world -- though there is an automatic importer from maven now, mdm still requires explicitly specifying every dependency.
Have you found ways to insulate gradle downloading updates to plugins or components of itself?
What happens when a dependency adds new transitivity dependencies? I guess that's not a threat during normal rebuilds, since specifying hashes ahead of time already essentially forbids loosely semver-ish resolution of dependencies at every rebuild, but if it does happen during an upgrade, does gradle-witness hook into gradle deeply enough that it can generate warnings for new dependencies that aren't watched?
This plugin looks like a great hybrid approach that keeps what you like from gradle and while starting to layer on "pinning" integrity checks. I'll recommend it to colleagues building their software with gradle.
P.S. is the license on gradle-witness such that I can fork or use the code as inspiration for writing an mdm+gradle binding plugin? I'm not sure if it makes more sense to produce a gradle plugin, or to just add transitive resolution tools to mdm so it can do first-time setup like gradle-witness does on its own, but I'm looking at it!
--
Edited: to also link the wikipedia chart of hash families.
Tldr: The length extension property of the Sha2 family has nothing to do with collisions. If you are afraid of future cryptanlytic breakthroughs regarding the collision resistance of Sha2 use the concatenation of SHA-256 and SHA3-256.
You are mightily confused about the connection of length extensions and collisions. (Bear with me, I know you are already familiar with length extensions, but I need to introduce the notation to explain your misunderstanding. Also it is nice to introduce the other readers to the topic.) All actually used cryptographic hash functions are iterative hash functions. That is the message is first padded to a multiple of the block size and then cut into blocks b_1,...,b_n. The core of the hash function is a compression function c (or maybe we should better call it mix-in function if we also have the Sha3 winner Keccak in mind) that takes the internal state i_k and a block b_k+1 and outputs the new internal state i_k+1: i_k+1=c(i_k,b_k+1). The hash function has a pre-defined initial state i_0. At last there is a finalization function f that takes the last internal state i_n (the state after processing the last message block) and outputs the result of the hash function. MD5, SHA1 and the SHA2 family have the problem that the finalization step is just the identity function, i.e. i_n is directly used as output of the hash function. Thus if you take the output you can continue the hash chain directly and calculate the hash of a longer message b_1,...,b_n,b_n+1,...,b_n+m.
Normally this is not that interesting (you could have just calculated the hash of the long message directly and avoided problems with padding that I'm ignoring to keep it simple). But in some situations you might not actually know b_1,...,b_n. For example if you calculate an authentication tag as t=h(k||m) where k is secret key you share with the API and m is the message you want to authenticate. As t is published an attacker an pickup the internal state i_n=t and calculate the authentication tag of an extension: t2=i_n+m=h(k||m||m2) authenticates m||m2. Twitter had this problem with its API: http://vnhacker.blogspot.de/2009/09/flickrs-api-signature-fo... . If you want to do authentication tags right, use HMAC: https://en.wikipedia.org/wiki/Hash-based_message_authenticat...
Now a collision describes the case where two messages b_1,...,b_n and d_1,...,d_m produce the same hash, i.e. the output of the finalization function is identical. As SHA2 does not have a finalization function and I would not believe that collisions in SHA3 would only be possible in the finalization step (squeezing step in the terminology of the Keccak sponge function) if it is possible at all I'll ignore it as a source for collisions for the moment. Thus a collisions means that after processing the k blocks of b_1,...,b_k and the l blocks d_1,...,d_l the internal state is identical. As the hash function is iterative we can continue both block lists with identical blocks e_1,...,e_j and still have the same internal state. As the state is identical it does not matter which finalization function we apply the output is identical and b_1,...,b_k,e_1,...,e_j and d_1,...,d_l,e_1,...,e_j hash to same value. Thus whenever you have one collision in the internal state you can always produce infinitely many colliding messages. This property is independent of the finalization function and blake2, Sha-384, Skein as well as Sha3 allow such attacks (once one collision is found). If anything a fancy finalization function can make things worse by mapping two different internal states to the same output. But because of the problems with the length extension attacks (see twitter) a one-way finalization function is a standard requirement for hash functions nowadays.
If you are really concerned about future cryptanalysis of SHA2 you can spread the risk on several hash functions where each has to be broken to break the security of the overall construction. These are called hash combiners. The most simple one is the concatenation of two hashes: Sha256(m)||Sha3-256(m). This one will be collision resistant if either Sha256 or Sha3-256 is collision resistant. There are also combiners for other properties, like pseudo randomness, and also multi-property combiners. See for example http://tuprints.ulb.tu-darmstadt.de/2094/1/thesis.lehmann.pd... .
It amplifies collision finding by finding r-collision in time log2(r)2n/2, if r=2t for some t. Figure 1 in section 3 is a very intuitive picture of how.
The third paragraph of section 5 states that the attack is inapplicable on hashes with truncated output.
Although, now that you mention it, my dusty crypto knowledge is not enough for me to explain why this technique would be inapplicable on the internal states of a decomposed hash even if it's inapplicable on the final output. I shall have to read more.
I agree that if one can find a way to apply an HMAC it should also obviate concerns around length extension, and also that combining hashes (carefully) should be able to break only when both are broken.
> The third paragraph of section 5 states that the attack is inapplicable on hashes with truncated output.
They actually refer to the large internal state size that makes the generic attack infeasible (for a state size of n bit you need 2^(n/2) many tries to find a collision on average).
> in the second, the attacker needs collisions in the full internal state of the hash function, rather than on the truncated states.
But as both sha256 and sha3-256 have internal state sizes >= 256 bit these are definitely enough for the foreseeable future to protect against generic attack. More interesting is the question whether you can combine specialized cryptanalysis two different hashes to build multicollisions. Apparently you can, at least for MD5 and SHA1: http://www.iacr.org/archive/asiacrypt2009/59120136/59120136....
If I understand correctly, SHA-256 is part of the SHA-2 family of hash algorithms, and like SHA-1, when used alone it is subject to length extension attacks.
SHA-384 is also a member of the SHA-2 algorithm family, but is immune to length extension attacks because it runs with an internal state size of 512 bits -- by emitting fewer bits than its total internal state, length extensions are ruled out. (Wikipedia has a great table for clarifying all these confusing names and families of hashes: [5].) Other hashes like BLAKE-2 [1], though young, also promise built-in immunity to length-extension attacks. mdm [2] is immune to this because the relevant git datastructures all include either explicit field lengths as a prefix, or are sorted lists will null terminators, both of which diffuse length extension attacks by virtue of breaking their data format if extended.
Not that it's by any means easy to find a SHA-256 collision at present; but should collisions be found in the future, a length extension attack will increase the leverage for using those collisions to produce binaries that slip past this verification. An md5 Collision Demo[3] by Peter Selinger is my favourite site for concretely demonstrating what this looks like (though I think this[4] publication by CITS mentions the relationship to length extension more explicitly).
(I probably don't need to lecture to you of all people about length extensions :) but it's a subject I just recently refreshed myself on, and I wanted to try to leave a decent explanation here for unfamiliar readers.)
--
I'm also curious how you handled management of checksums for transitive dependencies. I recall we talked about this subject in private back in April, and one of the concerns you had with mdm was the challenge of integrating it with existing concepts of "artifacts" from the maven/gradle/etc world -- though there is an automatic importer from maven now, mdm still requires explicitly specifying every dependency.
Have you found ways to insulate gradle downloading updates to plugins or components of itself?
What happens when a dependency adds new transitivity dependencies? I guess that's not a threat during normal rebuilds, since specifying hashes ahead of time already essentially forbids loosely semver-ish resolution of dependencies at every rebuild, but if it does happen during an upgrade, does gradle-witness hook into gradle deeply enough that it can generate warnings for new dependencies that aren't watched?
This plugin looks like a great hybrid approach that keeps what you like from gradle and while starting to layer on "pinning" integrity checks. I'll recommend it to colleagues building their software with gradle.
P.S. is the license on gradle-witness such that I can fork or use the code as inspiration for writing an mdm+gradle binding plugin? I'm not sure if it makes more sense to produce a gradle plugin, or to just add transitive resolution tools to mdm so it can do first-time setup like gradle-witness does on its own, but I'm looking at it!
--
Edited: to also link the wikipedia chart of hash families.
--
[1] https://blake2.net/
[2] https://github.com/polydawn/mdm/
[3] http://www.mscs.dal.ca/~selinger/md5collision/
[4] http://web.archive.org/web/20071226014140/http://www.cits.ru...
[5] https://en.wikipedia.org/wiki/SHA-512#Comparison_of_SHA_func...