Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For loop regression in .NET 9, please submit an issue at dotnet/runtime. It’s yet another loop tearing miscompilation caused by suboptimal loop lowering changes if my guess is correct.


No problem, I've raised the issue as https://github.com/dotnet/runtime/issues/114047 .


Thanks!


19 Hours in and that PR has already hands on from multiple people at MS. Incredible.


UPD: For those interested, it was an interaction between microbenchmark algorithm and tiered compilation and not a regression.

https://github.com/dotnet/runtime/issues/114047#issuecomment...


This is a ten line function that takes half a second to run.

Why do you have to call it more than 50 times before it gets fully optimized?? Is the decision-maker completely unaware of the execution time?


Long-running methods (like the one here) transition mid-execution to more optimized versions, via on-stack replacement (OSR), after roughly 50K iterations. So you end up running optimized code either if the method is called a lot or loops frequently.

The OSR transition happens here, but between .net8 and .net9 some aspects of loop optimizations in OSR code regressed.


So there actually was a regression and it wasn't an intentional warmup delay?


There indeed is a regression if the method is only called a few times. But not if it is called frequently.

With BenchmarkDotNet it may not be obvious which scenario you intend to measure and which one you end up measuring. BDN runs the benchmark method enough times to exceed some overall "goal" time for measuring (250 ms I think). This may require many calls or may just require one.


> Why do you have to call it more than 50 times before it gets fully optimized?? Is the decision-maker completely unaware of the execution time?

If you read the linked conversation, you'll notice that there are multiple factors at play.

Here's the document that roughly outlines the tiered compilation and DPGO flows: https://github.com/dotnet/runtime/blob/main/docs/design/feat... note that it may be slightly dated since the exact tuning is subject to change between releases


The optimiser doesn't know how long optimisation will take or how much time it will save before starting the work, therefore it has to hold off on optimising not frequently called functions.

There are also often multiple concrete types that can be passed in, optimising for one will not help if it is also getting called with other concrete types.


> The optimiser doesn't know how long optimisation will take or how much time it will save before starting the work, therefore it has to hold off on optimising not frequently called functions.

I don't buy that logic.

It can use the length of the function to estimate how long it will take.

It can estimate the time savings by the total amount of time the function uses. Time used is a far better metric than call count. And the math to track it is not significantly more complicated than a counter.


  > It can use the length of the function to estimate how long it will take.
Ah, yes, because a function that defines and then prints a 10,000 line string will take x1,000 longer to run than a 10 line function which does matrix multiplication over several billion elements.


I think he meant how long it will take to optimize it

It is naive eitherway


It's naive but it's so so much better than letting a single small function run for 15 CPU seconds and deciding it's still not worth optimizing it yet because that was only 30 calls.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: