Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> - rose tree "integrated shrinking" (eg. Hedgehog) follows the constraints of the generators, but has issues with monadic bind.

We're at the limits of my amateur knowledge, but I believe this is a fundamental limitation of monadic bind/generators. Instead, you should prefer applicative generators for optimal shrinking. https://github.com/hedgehogqa/haskell-hedgehog/issues/473#is...

In other words, applicative generators do not use "results of generators to dispatch to another generator", but instead shrinking is optimal due to the "parallel" nature of applicatives (I'm using "parallel" in the monadic sense, and not the sense of article's "threading" sense). Since applicatives are "parallel", they can shrink the generators independently. (Whereas monadic generators are in "series" and therefore shrinking one necessarily changes the behavior of the subsequent generator, as you noted.)

Please feel free to link your talk if it's public!



Re the talk: it's unfortunately one of the Haskell Exchange talks that got pulled from the internet when the organizer, SkillsMatter, imploded. The Haskell Foundation is trying to revive and reupload those recordings, but my particular one isn't back on YouTube yet. But here's at least the slide deck: https://drive.google.com/drive/folders/1DjfhdeFW-qCCSlJUv_Xw...

I agree using applicatives helps somewhat, it basically minimizes your usage of bind.

Hypothesis works around this (and gets a bind that shrinks nicely) by recording those choices into (in principle) a flat list, and shrinking that. So your shrinker can work on both "before the bound fn" and "after it" at the same time, but the result might be something that the generator can't parse back into a value.


I now realize how far I am from amateur knowledge, I don't understand any of this.

But always interesting to discover an avenue I haven't explored before :)


Sequence points in C and CPS/ANF compilation in Lisp are kind of related to this. Both monads and applicatives enable you to define f(g(), h()), where f, g, and h are stateful computations of some sort, but monads force you to specify which of g and h is invoked first [so it’s more like x = g(), y = h(), f(x, y)] while with applicatives the implementor of the stateful model can decide what happens in such cases.

[Disclaimer: I don’t know how QuickCheck actually works, so I’d appreciate a sanity check (hah) of the following from somebody with actual knowledge on the matter.]

GP’s point, if I understood it correctly, is as follows. If you’re doing randomized testing, then g and h could perhaps be defined as { for (i=0; i<n; i++) { int x = rand(); if (!fork()) return x; } } (and yes, in monad-land the moral equivalent of fork() is admissible as what I vaguely called a “stateful computation”). You see how strict ordering of side effects enforced by monads essentially forces you into a depth-first search of the state space (although I guess you could do iterative deepening?), while applicatives can allow for different exploration strategies (ones more interesting than C’s official behaviour of “that’s UB, you fool”).


C sequence points are arbitrarily imposed by imperial decree in the standard; they are not linked to any evaluation causality.

Dependency-driven de facto evaluation orders have always existed in C though: things that you can figure out must be ordered even though there isn't a sequence point.

The standard spells out that expressions like i = i + i are okay, but actually it's not necessary to do so. The assignment cannot take place until the assigned value is known. The assigned value is not known until the + is evaluated and the + cannot be evaluated until the operand values are retrieved. Therefore, the retrieval of the prior value of i necessarily precedes the movement of the new value into i.

This, rather than sequence points, is rather a bit like monadic sequencing.


> The assignment cannot take place until the assigned value is known.

Of course it can. It's retrieving the value out of the lvalue is what cannot be done and even then you can still stretch this delay further with the "as-if" rule. Haskell does something similar with its lazy evaluation IIRC.


> C sequence points are arbitrarily imposed [and] not linked to any evaluation causality.

That’s exactly my point. Writing g >>= \x -> h >>= \y -> f x y is like writing (x = g(), y = h(), f(x,y)), with an arbitrary ordering between computations g and h that the programmer is forced to choose, whereas f <$> g <*> h is like f(g(), h()), with no ordering between g and h imposed by the applicative model itself, similar to the absence of a sequence point between sibling arguments in C.


Let’s say your test data is a record with two fields that can be generated independently. After the property test framework finds a bug, it can independently shrink each field to find a simpler failing test.

Contrast with a test data generator that randomly chooses the first field, and then calculates the second field based on the first field’s value. The framework can’t directly mutate the second field because there’s a data dependency. If it just changes the second field, it will get a record that couldn’t have been generated.

The Hypothesis test framework took a different approach. Instead of mutating the test data directly, it mutates the “random” input and reruns the generator. If you do shrinking that way, you always get valid test data, though it might not be very similar to the original.

This is similar to what fuzzers do - you start with unstructured binary data and build data structures out of it. Your “random” test data generator is really a parser of random (or maybe not so random) input.


To have an idea of monads being sequential, think about a JS Promise:

  promise.then(...).then(...).then(...)...
Promises are (almost) monadic, and the chain is sequential. You can't make it parallel, and that's the point: monadic binding represents sequential computation, aka "the semicolon operator".

To have an idea about applicatives being parallel, think about a list of functions, and a list of values; each function would be applied to a corresponding value, resulting in a list of results:

  results = []
  for ix in range(functions):
    f = functions[ix]
    x = values[ix]
    results.append(f(x))
It's pretty obvious that you can do each f(x) in parallel and in any order, instead of the sequential code above.

(Why would one care about this all? Because many daily things are functors, applicatives, and monads, e.g. a list is usually all three.)


Haha yeah I kinda went off the deep end with applicatives. Here's a short primer on applicative vs monadic shrinking behavior using F# syntax https://github.com/hedgehogqa/fsharp-hedgehog/issues/419#iss...

You can think of `let!` as `let + await`, and `let! x ... and! y` as `await Parallel([x, y])`.

Please feel free to ask any questions if it's still confusing!


Not quite accurate with the Parallel example, if I understand you correctly. Don Syme is explicit that applicative `async` should not implicitly start work in the thread pool (https://github.com/dotnet/fsharp/issues/10301#issuecomment-7...).


My usage of "parallel/await" is entirely metaphorical; I was kinda going for a Javascript-esque syntax with "await Parallel" - I'm assuming most people aren't familiar with F#'s `and!`. It doesn't make sense for applicative generators to (necessarily) use the thread pool.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: