Building a Django driver for Psycopg 3

parhamn · on Aug 2, 2021

I was wondering what the major changes between psycopg2 and psycopg3 were. Found a post from the maintainer here: https://www.varrazzo.com/blog/2020/03/06/thinking-psycopg3/

Main takeaways:

- Asyncio from the ground up

- Uses PQexecParams to do database-side escaping and interpolation

- ContextManager and transaction api improvements for

- Python only fallback if the C extension fails to build

Waterluvian · on Aug 2, 2021

That last one is great if it’s hidden behind a flag and some loud messages saying “failed to build, consider using —-python-only with performance limitation.”

Otherwise it’s going to be a source of mysterious performance issues with a single error message hidden somewhere in your CI or other logs.

acdha · on Aug 2, 2021

I'm a bit mixed on a flag because C library dependencies are a fair support burden and an awful lot of people do not have workloads where a pure Python implementation is a significant performance issue because it's lost in the actual work done by the database.

I'd agree with something like a warning on first load or maybe something like an optional package so you could pin `psycopg3[cext]` if you want to guarantee the extension installed.

dvarrazzo · on Aug 2, 2021

There is a mechanism to ensure that there is no unexpected regression. Exposing the PSYCOPG_IMPL env var a program can make sure to obtain a specific implementation and import fails if it's not available. https://www.psycopg.org/psycopg3/docs/api/pq.html

odiroot · on Aug 2, 2021

> - Asyncio from the ground up

> - Python only fallback if the C extension fails to build

These two are really big! The former, for first-party async support, making it much easier to use with async frameworks.

The latter would make it much easier to ship in containers. I assume it's still better use the C version, for performance reasons.

whoknowswhat11 · on Aug 2, 2021

YES to this last!! Musl cross deploy can be annoying otherwise - any tips on building Alpine Linux ?

trulyme · on Aug 2, 2021

Just a huge thank you to Daniele for developing and maintaining psycopg! The article shows the level of professionalism and care that goes into this project. Using Django to test implementation is a genius idea. I can't wait to try the new version in a few of my projects, the changes (especially server-side parameters binding) sound great.

perlgeek · on Aug 2, 2021

A somebody who first started using DB code from Perl, and later learned Python, I always wondered by Python doesn't have a general database interface like DBI in Perl.

In Perl, all the database specific modules have a DBI backend, and all the higher-level modules (django-like frameworks, for example) rely on DBI.

In Python, SQLAlchemy has its own psycopg and cmysql integrations, and does django, and likely several other frameworks.

(Java has a similar standard, with JDBC, I believe; though I have never used it, so I might be misunderstanding something here).

zzzeek · on Aug 2, 2021

Creator of sqlalchemy here, I came from the perl DBI and JDBC worlds prior to starting python. We have pep 249, but it's generally a very loose spec, projects that implement it basically choose how much they want to follow or not follow it, and then with the introduction of asyncio people are walking away from the whole thing as the spec has not evolved pretty much at all for many years. At the top is that there is no actual library in which these drivers all need to bind towards as is the case with DBI and JDBC.

All of that said the reality is that databases are so different in how they define client interactions you're going to have these problems with either approach. The transparency of Python allows it to be ultimately easier to work through these issues, though I've always wished there was a better pep249 story.

prpl · on Aug 2, 2021

I had the chance to complain to Guido once about wanting a better DBAPI, especially on the subject of result set metadata/typing (looking at you, pysqlite with your worthless cursor.description)

This was after a mypy/typing talk, so highly tangentially relevant I suppose.

While it's not perfect, I think JDBC really is the gold standard, and I wish there was a DBAPI spec that was a bit closer to that, especially something with proper prepared statements. I haven't used database/sql in Go but it seems okay too.

Thanks for sqlalchemy - it has become the de facto DBAPI in many projects I've worked on.

anewhnaccount2 · on Aug 2, 2021

There is https://www.python.org/dev/peps/pep-0249/ . I suppose part of the issue is that there is probably deviation from the standard and lots of extensions. I imagine that the standard means there is a lot of shared code between the different connectors though.

nerdponx · on Aug 2, 2021

As far as I know, there is zero or close to zero shared code.

PEP 249 is great to have, and it's nice to at least have a starting point for learning a new database connector library.

But not all database connector libraries implement the specification. One well-known library [0] explicitly and deliberately violates the spec with no spec-compliant "escape hatch". Also the style of passing parameters as a single sequence (e.g. list or tuple) tends to be a newbie trap, and having 4 different placeholder styles can be annoying.

That said, there are ODBC connector libraries [1] if you really do need a uniform interface. But at that point you might be better off with the SQLAlchemy "Core" query builder [2].

[0]: Asyncpg, https://magicstack.github.io/asyncpg/

[1]: https://wiki.python.org/moin/ODBC

[2]: https://docs.sqlalchemy.org/en/14/core/

lmm · on Aug 2, 2021

JDBC exists but I honestly think it's a mistake. Higher-level frameworks still end up with a bunch of special cases for handling different lower-level implementations, and meanwhile a lot of functionality gets stuck behind the lowest common denominator interface that JDBC is (e.g. it took forever to get any kind of async support even when both the things above and the things below JDBC were doing great at it).

merb · on Aug 2, 2021

> e.g. it took forever to get any kind of async support even when both the things above and the things below JDBC were doing great at it

is there even support for async jdbc? I tought they dropped the idea ? (https://mail.openjdk.java.net/pipermail/jdbc-spec-discuss/20...). The reasoning is stupid because async drivers is not only about threading and more about i/o, but w/e.

jsmeaton · on Aug 2, 2021

Casually building a django db backend as an acceptance test. Nice work and great write up!

I had déjà vu reading the section on bind parameters in aggregate queries. The oracle backend had a similar issue that was fixed/hacked by comparing values and grouping them into named parameters. https://code.djangoproject.com/ticket/27632

dvarrazzo · on Aug 2, 2021

This is interesting, thank you for pointing it out. Worth looking if the problems can be fixed the same way in both the adapter (and probably everything else using server-side binding).

slownews45 · on Aug 2, 2021

The example syntax for connection handling is perfect. The footgun of not releasing the connection (especially from pools at the end of the with) is real. And the concepts are different (block of transactions vs normal with file xxx. )

  with connect(DSN) as conn:
    with conn.transaction():
        do_something()
        with conn.transaction():
            do_something_nested()

    with conn.transaction() as tx:
        do_something_else()
        # we were just testing and we don't really want to do this
        tx.rollback()

  # and here the connection is closed

Any chance of pushing some changes up into the postgresql side? I'm thinking of a block of trx with params separate? Didn't look at it at all.

nerdponx · on Aug 2, 2021

For what it's worth, using context managers for this kind of thing is pretty much standard nowadays, and not supporting this behavior feels jarring and archaic.

slownews45 · on Aug 2, 2021

Agreed - I spent so much time trouble shooting this exact issue because I honestly couldn't even imagine that I wasn't freeing the connection.

I ended just giving up and doing a close() after searching for every use. But even then didn't immediately understand what was going on.

With XXX do Y

is totally standard - but here we open but do not close.

anarazel · on Aug 3, 2021

Could you expand on the "pushing into pg" bit?

Do you want to specify parameters that have a scope larger than a single statement and send the whole parameterized batch in one go?

gigatexal · on Aug 2, 2021

Ruh roh I have some code to fix. :sweat_smile:

geenat · on Aug 2, 2021

Really great to see more than one viable async database library aside from asyncpg.

Thank you!!

nerdponx · on Aug 2, 2021

What isn't viable about Aiopg?

(Assuming you are only talking about Postgres libraries.)

Maybe it's somewhat obsolete now that Psycopg itself supports async. I think most people used Asyncpg because it's "easier" and supports useful stuff like connection pools.

getty · on Aug 2, 2021

Thanks for your hard work in making the transition from psycopg2 to psycopg3 as simple and painless as possible, Daniele!