Primary keys and Foreign Keys

Hi,

Thank you for your great work and your dedication to Open Source!
We are in the process of evaluating ERPNext and currently migrating data from another ERP software.
We really want to make it work!

So, we have multiple companies (ideally reflected hierarchically but that’s a different topic),
each with their own customers/suppliers.

Looking at the database and source code, we are very surprised to

a) not find immutable IDs
b) not find foreign key constraints
c) not find composite primary keys given the company → customer/supplier → contact layers


Regarding IDs - varchar vs. numeric - Rushabh wrote in 2013:
“All tables have a unique “name” (which is actually id). This could be a numeric series or name (based on user preferences in masters like Customer, Supplier, Item)”

and in 2016:
“There are lots of benefits too, the biggest being that all foreign keys become human readable and we avoid using joins even for simple queries.
Its a design decision and there are pros and cons both ways, right now we have not faced major issues.”

Regarding the lack of foreign keys (in 2016):
“Because Foreign Keys automatically indexes the column in MariaDB and this creates performance issues as the data scales.”

When user “damian” suggested
“By changing your foreign keys to integers you’ll be able to have them indexed, foreign key constraints enabled and still have manageable index sizes.” (and also suggested the adoption of SQLAlchemy as ERPNext’s ORM)
Rushabh replied (2013):
“Yeah we have never looked into this recently - thanks for clarifying we will probably fix this at a later point. It will involve a major rewrite though!”


There are some negative, very practical implications of the above:

Re a) and b): If a company or contact name changes, URLs in historical emails/documents are broken and all existing relationships need to be adjusted. Relying on an API to do the latter is brittle - every new add-on will need to take this into account. A foreign key relationship in the database will guarantee referential integrity at all times, an integer primary key will have a cheap index and serve as a permanent identifier no matter which names change. Also we could not find checks which would confirm referential integrity by code. When we pump a historic invoice into the database for “Customer C”, that “Customer C” better exist!

Re c): Our company A might have a “Customer C Limited” and our company B might also have a “Customer C Limited” (in a different country). These are legal names, and with different users in Company A and B there is no real world reason to name them differently. A composite primary key would allow one “Customer C Limited” PER company, not across the whole database. Within two different customers we may have a Jane Smith (or Vikram Agarwal) again (Or Jane Miller marries and changes her name to Smith). Across all customers/suppliers of all companies we already have numerous real life examples of this problem. A composite key across the three layers would allow one contact per customer per company with the same name. However, that would require the current URLs to now have three components in order to be unique… which brings me back to the integer primary key of a) and b), solving ALSO this issue. There are reasons why this has become widespread practice.


A join is cheap but also necessary in all but the simplest of cases. A URL can be made human-readable by inserting an otherwise technically irrelevant string.
The price of risking referential integrity and problems with mutable primary keys are in our opinion not worth the benefits, and are a design decision that needs to be adjusted before we can move forward.

We love Open Source and we want ERPNext to be suitable for big companies.
I would like to find out:

  • Do you acknowledge that the mutable varchar primary key and lack of foreign keys is a fundamental problem?
  • If yes: Is solving this problem on the short term agenda? We want to help.
  • If no: Would you adopt a change to primary and foreign keys developed by us upstream?
  • Would a foundation membership change any of this?

Thank you for your help.

Kind regards,
Danny

3 Likes

@danny thanks for doing your research well on this topic :wink:

Answers:

Yes and no. Yes because we end up doing a lot of the database work on the ORM layer and we leave the system vulnerable to anyone writes directly to the database. In this case, we always recommend users to add data via the ORM or REST API. There are also helper methods to speed up the inserts / updates.

No: because, this makes things an order of magnitude simpler on the querying side (not having joins). Complex transactions like Invoices have close to 25 references, so adding 25 joins not only adds to complexity but also loss of performance.

No, not on the short term agenda.

Let me warn you, that this will be on massive rewrite. You may really want to identify use-cases why you want to do this rather than just a categorical idea that X is the best way of doing things. In my experience, every decision has a cost.

If you have executed it well and it seems like there is a reasonable performance improvement and the migration works effortlessly, then yes, we will accept it. There are no “religious” issues here. (The only religious issues we have are about how to name things, and we are strongly against fragmentation of the platform).

No. The foundation is just a stake holder in the project like any other user or developer. Merge decisions are based on merit only. The foundation can help you make new friends in the community though, and show your intent as a serious contributor :wink:

4 Likes