Quite to the contrary, I'd say this update is evidence of the inner loop being hyperoptimized!
MSVC's support for musttail is hot off the press:
> The [[msvc::musttail]] attribute, introduced in MSVC Build Tools version 14.50, is an experimental x64-only Microsoft-specific attribute that enforces tail-call optimization. [1]
MSVC Build Tools version 14.50 was released last month, and it only took a few weeks for the CPython crew to turn that around into a performance improvement.
Python’s goal is never really to be fast. If that were its goal, it would’ve had a JIT long ago instead of toying with optimizing the interpreter. Guido prioritized code simplicity over speed. A lot of speed improvements including the JIT (PEP 744 – JIT Compilation) came about after he stepped down.
I doubt it would have a JIT a long time ago. Thing is, people have been making JIT compilers for Python for a long time now, but the semantics of the language itself is such that it's often hard to benefit from it because most of the time isn't in the bytecode interpreter itself, it's dispatching things. People like comparing Python to JavaScript, but Python is much more flexible - all "primitive" types are objects can be subclassed for example, and even basic machinery like attribute lookups have a bunch of customization hooks.
So the problem is basically that a simple JIT is not beneficial for Python. So you have to invest a lot of time and effort to get a few percent faster on a typical workload. Or you have to tighten up the language and/or break the C ABI, but then you break many existing popular libraries.
Those people usually overlook the history of Smalltalk, Self and Common Lisp, which are just as dynamic if not more, due to image use, debugging and compilation on the fly where anything can be changed at any time.
For all its dynamism, Python doesn't have anything closer to becomes:.
I would say that by now what is holding Python back is the C ABI and the culture that considers C code as Python.
> People like comparing Python to JavaScript, but Python is much more flexible - all "primitive" types are objects can be subclassed for example, and even basic machinery like attribute lookups have a bunch of customization hooks.
Most of the time, people don't use any of these customisations, don't they?
So you'd need machinery that makes the common path go fast, but can fall back onto the customised path, if necessary?
Descriptors underpin some common language features like method calls (that's how `self` gets bound), properties etc. You can still do it by special casing all those, and making sure that the way you implement all those primitives works exactly as if it used descriptors, sure. But at this point it's not exactly a simple JIT anymore.
Should probably mention that Guido ended up on the team working on a pretty credible JIT effort. Though Microsoft subsequently threw a wrench in it with layoffs. Not sure the status now.
For comparison: when Javascript was first designed, performance wasn't a goal. Later on, people who had performance as a goal worked on Javascript implementations. Thanks to heroic efforts, nowadays Javascript is one of the language with decently fast implementation around. The base design of the language hasn't changed much (though how people use it might have changed a bit).
This is (a) wildly over expectations for open source and (b) a massive pain to maintain, and (c) not even the biggest timewaster of python, which is the packaging "system".
> not even the biggest timewaster of python, which is the packaging "system".
For frequent, short-running scripts: start-up time! Every import has to scan a billion different directories for where the module might live, even for standard modules included with the interpreter.
This can't come soon enough. Python is great for CLIs until you build something complex and a simple --help takes seconds. It's not something easily worked around without making your code very ugly.
I remember a former colleague, (may he RIP) ported a similar optimization to our fork of Python 2.5, circa 2007. We were running Linux on PPC and it gave us that similar 10-15% boost at the time.
Games aren't written in Python as a whole, but Python is used as a scripting language. It's definitely less popular now than it used to be, mostly thanks to Lua, but it still happens.
I'd have expected it to be hand rolled assembly for the major ISAs, with a C backup for less common ones.
How much energy has been wasted worldwide because of a relatively unoptimized interpreter?