That last one is great if it’s hidden behind a flag and some loud messages saying “failed to build, consider using —-python-only with performance limitation.”
Otherwise it’s going to be a source of mysterious performance issues with a single error message hidden somewhere in your CI or other logs.
I'm a bit mixed on a flag because C library dependencies are a fair support burden and an awful lot of people do not have workloads where a pure Python implementation is a significant performance issue because it's lost in the actual work done by the database.
I'd agree with something like a warning on first load or maybe something like an optional package so you could pin `psycopg3[cext]` if you want to guarantee the extension installed.
There is a mechanism to ensure that there is no unexpected regression. Exposing the PSYCOPG_IMPL env var a program can make sure to obtain a specific implementation and import fails if it's not available. https://www.psycopg.org/psycopg3/docs/api/pq.html
Just a huge thank you to Daniele for developing and maintaining psycopg! The article shows the level of professionalism and care that goes into this project. Using Django to test implementation is a genius idea. I can't wait to try the new version in a few of my projects, the changes (especially server-side parameters binding) sound great.
A somebody who first started using DB code from Perl, and later learned Python, I always wondered by Python doesn't have a general database interface like DBI in Perl.
In Perl, all the database specific modules have a DBI backend, and all the higher-level modules (django-like frameworks, for example) rely on DBI.
In Python, SQLAlchemy has its own psycopg and cmysql integrations, and does django, and likely several other frameworks.
(Java has a similar standard, with JDBC, I believe; though I have never used it, so I might be misunderstanding something here).
Creator of sqlalchemy here, I came from the perl DBI and JDBC worlds prior to starting python. We have pep 249, but it's generally a very loose spec, projects that implement it basically choose how much they want to follow or not follow it, and then with the introduction of asyncio people are walking away from the whole thing as the spec has not evolved pretty much at all for many years. At the top is that there is no actual library in which these drivers all need to bind towards as is the case with DBI and JDBC.
All of that said the reality is that databases are so different in how they define client interactions you're going to have these problems with either approach. The transparency of Python allows it to be ultimately easier to work through these issues, though I've always wished there was a better pep249 story.
I had the chance to complain to Guido once about wanting a better DBAPI, especially on the subject of result set metadata/typing (looking at you, pysqlite with your worthless cursor.description)
This was after a mypy/typing talk, so highly tangentially relevant I suppose.
While it's not perfect, I think JDBC really is the gold standard, and I wish there was a DBAPI spec that was a bit closer to that, especially something with proper prepared statements. I haven't used database/sql in Go but it seems okay too.
Thanks for sqlalchemy - it has become the de facto DBAPI in many projects I've worked on.
There is https://www.python.org/dev/peps/pep-0249/ . I suppose part of the issue is that there is probably deviation from the standard and lots of extensions. I imagine that the standard means there is a lot of shared code between the different connectors though.
As far as I know, there is zero or close to zero shared code.
PEP 249 is great to have, and it's nice to at least have a starting point for learning a new database connector library.
But not all database connector libraries implement the specification. One well-known library [0] explicitly and deliberately violates the spec with no spec-compliant "escape hatch". Also the style of passing parameters as a single sequence (e.g. list or tuple) tends to be a newbie trap, and having 4 different placeholder styles can be annoying.
That said, there are ODBC connector libraries [1] if you really do need a uniform interface. But at that point you might be better off with the SQLAlchemy "Core" query builder [2].
JDBC exists but I honestly think it's a mistake. Higher-level frameworks still end up with a bunch of special cases for handling different lower-level implementations, and meanwhile a lot of functionality gets stuck behind the lowest common denominator interface that JDBC is (e.g. it took forever to get any kind of async support even when both the things above and the things below JDBC were doing great at it).
Casually building a django db backend as an acceptance test. Nice work and great write up!
I had déjà vu reading the section on bind parameters in aggregate queries. The oracle backend had a similar issue that was fixed/hacked by comparing values and grouping them into named parameters. https://code.djangoproject.com/ticket/27632
This is interesting, thank you for pointing it out. Worth looking if the problems can be fixed the same way in both the adapter (and probably everything else using server-side binding).
The example syntax for connection handling is perfect. The footgun of not releasing the connection (especially from pools at the end of the with) is real. And the concepts are different (block of transactions vs normal with file xxx. )
with connect(DSN) as conn:
with conn.transaction():
do_something()
with conn.transaction():
do_something_nested()
with conn.transaction() as tx:
do_something_else()
# we were just testing and we don't really want to do this
tx.rollback()
# and here the connection is closed
Any chance of pushing some changes up into the postgresql side? I'm thinking of a block of trx with params separate? Didn't look at it at all.
For what it's worth, using context managers for this kind of thing is pretty much standard nowadays, and not supporting this behavior feels jarring and archaic.
(Assuming you are only talking about Postgres libraries.)
Maybe it's somewhat obsolete now that Psycopg itself supports async. I think most people used Asyncpg because it's "easier" and supports useful stuff like connection pools.
Main takeaways:
- Asyncio from the ground up
- Uses PQexecParams to do database-side escaping and interpolation
- ContextManager and transaction api improvements for
- Python only fallback if the C extension fails to build