ERPNext PostgreSQL Support

pythoncrockpot · October 17, 2020, 3:23pm

Are there any updates on ERPNext Postgres support? Is this work in progress?

kisg · October 17, 2020, 8:06pm

Hi,

the last thing I saw here on the forum was that Postgres support would need funding or a significant contribution due to the 1000s of custom queries that need to be checked and potentially modified for Postgres. I am not aware of anyone working on it.

I am very interested in getting ERPNext to work on Postgres, since this is the DB that I know (and like) the best. I have been using it since 1999 (version 6.5)…

Also, there are at least 2 distributed SQL DB implementations based on the PostgreSQL wire protocol (CockroachDB and YugabyteDB - which is also using Postgres code for a lot of its SQL implementation). This could allow ERPNext to scale to huge instances. Of course such scaling would require other changes as well, but at least the DB would not be the bottleneck.

I have been reviewing the Frappe Database layer implementation in the past few days, and I think before PostgreSQL support is added, the DB layer needs to be refactored in the following ways:

Use numeric ids and foreign keys

The first step would be to change the schema to have an internal numeric id for every document and use those ids for linking, and also introduce foreign key constraints to introduce referential integrity checks. Note that I don’t propose to change the role of the “name” column on the application level, it would still be a key of the document, but there would also be an artifical key for each document for easy linking.

This idea has been brought up numerous times (the last one was quite recently), and the general position of the Frappe core team is (in my understanding), that they are not against it on principle, but this is a complex change that would complicate the SQL queries, so this change would have to bring in other benefits (e.g. substantial performance improvement) apart from being just a different implementation. They don’t see the lack of referential integrity checks in the DB as a crucial issue, because referential integrity is checked on the application level anyways.

I believe that this refactoring can be done in a way that it will provide enough benefits in terms of performance and reliability that in the end the Frappe team will accept it. However, it will require a substantial effort to get there. The biggest issue is not even the Frappe Framework ORM, that can be easily managed, but the 1000s of custom queries all over the ERPNext source code, which brings us to our next point.

Removal of frappe.db.sql()

Currently frappe.db.sql() is used all over ERPNext to create custom queries into the database. I think this creates a huge maintainability problem, and also the main source of issues when someone tries to add support for a new DB engine.

To make it clear, I think custom queries are essential and we need to keep this functionality. I also think that direct SQL access to the DB is the best way to go, and would not like to see an “abstraction” on top of the ORM layer like JPQL in JPA. I only have issues with the frappe.db.sql() API itself.

I think the main source of the problem is that frappe.db.sql() makes creating custom queries just too easy. What I mean is that such queries can be done anywhere in the code with no explicit naming or ceremony. At the same time these queries break through the barrier of the ORM abstraction layer and create a very strong binding with the physical database schema.

Instead, my proposal is to switch to using a DAO pattern for the custom queries. Essentially every module / Frappe app would have a DAO class that implements the custom queries as normal Python methods. It would look something like this (pseudocode):

class AccountingDao: frappe.Dao
    # Initialization omitted

    def executeComplexQuery(param1, param2, ... paramN):
        if not self._complexQueryStatement:
          self._complexQueryStatement = getDB().prepareStatement("complex SQL with %1 %2 ... %N parameters");
        result = self._complexQueryStatement.execute(param1, param2, .... paramN);
        # Possible post-processing of query results omitted
        return result

There are numerous benefits to this approach:

the places where the ORM abstraction is broken are concentrated into a single place, so any change in the DB schema is easier to fix in the custom queries
The custom queries themselves can be unit tested using the Python API, so we can set up automated tests to warn if anything would break the custom queries.
If a query needs to be database specific, then a database specific subclass of the DAO class can be created, where only the problematic method is reimplemented.
the use of prepared statements will increase database performance, especially for complex queries (which is the reason someone writes a custom query anyways).
Standard BPF based profiling can be used to pinpoint slow queries.
…

With the custom queries fixed, now we can move to the next step:

Implement Postgres support for ERPNext

I expect that with the new structure and the unit tests created for custom queries it will be quite easy to find the queries that are indeed MariaDB specific and create the specific Postgres versions.
Also, going forward, any breakage in the Postgres backend should be caught by the DAO unit tests.

Optional: Introduce a Fluent API for SQL Queries

I also would see the benefit of introducing a Fluent API for the creation of SQL queries. Fluent APIs are essentially APIs based on the Builder pattern. The benefit of this would be the possibility to reuse common query parts without resorting to string concatenation, which is a recipe for bugs and chaos.

For example, a common issue with the referential integrity / numeric id proposal is, that when someone wants to query a doctype with a linked document’s name as the filter, one has to add a join / subquery to do so.
With a Fluent API this common case could be solved easily by creating a utility function that creates the necessary subquery / join on any query object that is passed in as the parameter.

Summary

As you can see this is a quite complex project that requires significant effort and funding. However, the benefits would be much greater to the community than just the support of PostgreSQL. The performance, reliability and maintainability benefits would be enjoyed by every user of ERPNext.

I would be very interested in hearing what the community things about this proposal, and if there are anyone who would be interested in participating (e.g. with funding or work).

I would be especially interested in hearing the Frappe Team’s opinion.

Best Regards,
Gergely

vrms · March 20, 2021, 8:57am

What you are saying sounds well thought through but I lack the technical understanding to judge your proposal in detail.

But I would say though that it would be a step to mature the project whether erpnext could be made PostgreSQL ready.

atc · April 12, 2022, 9:53pm

This is great. How much time do you think is involved in making both this change and rewriting the 1000~ queries?

MichaelPinkowski · April 13, 2022, 3:22pm

I think your answer starts here. Postgres support for ERPNext · Issue #24389 · frappe/erpnext · GitHub

I have seen a pretty steady stream of messages about the Postgres project coming through GitHub but I have no idea what the “progress bar” looks like.

ankush · June 17, 2022, 4:45pm

Here’s what progress bar looks like : Postgres support for ERPNext · GitHub

Though ToDo list is kinda incomplete / ever-growing as we discover more requirements.

credit:

This is largely thanks to @cpdeethree (on github) I am just reviewing / resolving issues as I find.
Also some previous work by @Verequies (github) is reused too.
Also slow and steady progress of DB-agnosticism achieved through the adoption of query builder in core: [Announcement] Introducing Frappe Query Builder