Testing ERPnext 700,000 rows of data

How many hours of development are we talking about?

you can find my.cnf in /etc/mysql/

  • Fine Tuning

max_connections = 10
max_allowed_packet = 128M
sort_buffer_size = 64M
bulk_insert_buffer_size = 128M
tmp_table_size = 256M
max_heap_table_size = 256M

  • MyISAM

key_buffer_size = 512M
myisam_sort_buffer_size = 1G
read_buffer_size = 64M
read_rnd_buffer_size = 32M

query_cache_limit = 2M
query_cache_size = 256M

*Inno DB
innodb_buffer_pool_size = 1G

Hello

@rmehta We are here implementing ERPNext to medium sized organizations in Bangladesh. We are not completely aware of the limitations . Exactly how large data scale can ERPNext handle . If it can be improved to handle large data I can contribute too . But within my limitation :slight_smile: . Is there any instruction about ERPNext’s capability ?

Thanks

1 Like

well only way to find out is to test.
im going to try 3 million rows in the tabItem on my development laptop while calibrating MySQL and see what happens if it crashes or runs slow i will try it in a Xeon Server with 16GB. maybe if ERPnext runs smoothly with 10 million rows of data it is safe enough for a large scale implementation

@cgpurbaugh I am not sure. We will have to add 2 fields for all Link Fields and add joins for default list views. Also Link controls will be redesigned to have a label / value instead of just value. Plus a patch to migrate all to the new system.

Seems like at least 2 weeks, but we can sponsor (50%) so maybe a week’s funding will help us get started.

@Coleteral do share your results!

For such a large db, 1G seems too less. Can you make it at least as large as your dataset? (don’t make it more than 70% of your RAM)

will do pdvyas
Thanks for the hint

Any update? :slight_smile:

Hi All,

Any updates regarding the change to frappe to improve large data sets? I have a project that requires importing 50mill records monthly and wondering if erpnext is a fit.

I would love to see that it is so how can I have to make it so? Funding, did we get enough, how much is required?

regards
Hemant

1 Like

@rmehta I am highly interested in this observation.I am about to use ERPnext for a client that the customers list will reach 3 million and heavy transactions will be on daily.
Let our amiable team let us now the position now.
Thanks.
Fred

Maybe ill share some of my experience.

I Currently have 2 Million+ row Item Master Database
and around 3 Million Transactions on different Modules

VM specs:
4 Cores
8GB (6GB dedicated to maria DB)
Users: 10 Simultaneous Users on Different Modules

[Transactions]
When making a transaction it takes 2-5 seconds to load an item from the link field, before you can select it.
But from other masters with just 2000 rows like suppliers it takes only 1-2 secs.

[Generating Reports]
When Generating Reports (Script Reports) it takes 10 secs to load a report ranging from a month.
But When it comes to financial Analytics Reports it takes a min or sometimes it says “Something went wrong”

[Importing Data]
I imported my data via database directly.

I tried burrowing some RAM from other VM’s, noticed a great increase in performance

The whole system runs smooth, but when somebody generates a report for a year, it slows down

So for now i can say that.
Erpnext can handle heavy transactions. as long as you have a good hardware.

For the 50 million data ,assuming that its from a different source, i am more concern on how to convert that data to erpnext readable data plus you have to do it monthly. :scream:

2 Likes

@rmehta Do you have a rough estimate (monetary wise) for such a transition?

The architecture with the varchar primary keys is a good thing under the point of view of making imports to the system. We are working with Talend and create the primary keys according the naming capabilities of ERPnext. This is important, when putting dependent things together when designing imports.

Of course the aspect of speed on the database can be a concern on big databases. It is well known, that MySQL is slow on inserting, but fast on reading.
Also Integer indexes are always faster than varchar indexes.

But i think, with a proper fitted machine for the database, things can be handled. An SAP with Oracle is also very slow, when running Oracle on a 2GB / 1 core Box.
Opensource users are expecting the software to run on cheap boxes. This is not the case, when you need horsepower.
There must be a big iron. Much cores and RAM can only be replaced by MORE cores and RAM… :slight_smile:
Also fast discs and maybe usage of SSDs are a very important thing.

Because of this i would begin testing with running the MySQL database server on a dedicated machine / VM and the ERPnext server also on a separate machine.

I know, that the Frappe team has put it all into one machine because of the simplicity in deployment and hosting. For small companies or companies with a small or medium amount of transactions this is totally enough.

In short terms:

  1. As stated in the posts above the database seems to be the bottleneck. So i would focus on index optimization and / or partitioning of data. Especially transaction data can be partitioned, because old data are used only sometimes. MariaDB supports partitioning.
  2. Then i would place the MySQL on a separate big iron / VM and test again vs the monolithic system.
    Also important is a good connection between the ERPnext server and the MySQL server. GBit-network or better. This can be an issue when hosting in the cloud or datacenter, where you have no influence on the peering.

I hope this helps a little bit… :slight_smile:

2 Likes

We are using ERPnext for what we consider a small/medium scale manufacturing concern, with plans to extend to another much larger concern. Currently we have around half a million entries in 6 months, using a long item names of 20-40 characters. We see transaction speeds of 1-2 minutes per transaction, eg. when doing stock movements or processing work orders.

As this is a very old thread, I’m wondering if this issue was resolved in future editions? Or if there is any solution?

1 Like

It appears that having Interger or Varchar as primary key should not make any difference. See this link:

1 Like

Rather it is the WIDTH (length, or number of characters?) that makes a difference in performance:

Blockquote
VARCHAR vs. INT doesn’t tell much. What matter is the access pattern.

Blockquote
On absolute terms, a wider key will always be worse than a narrow key. The type carries absolutely no importance, is the width that matters. When compared with INT though, few types can beat INT in narrowness, so INT usually wins that argument just by the fact that is only 4 bytes wide.

Blockquote
But what really matters is the choice of clustered key. Often confused with the primary key, the two represent different notions and are not required to overlap. Here is a more detailed discussion Should I design a table with a primary key of varchar or int?

@claily I assume you have dedicated 6GB of RAM by change value of innodb_buffer_pool_size = 6G

which can be found under /etc/mysql/my.cnf ? Right ?

Yes. .

Strange, I have set it to 18G as I have installed 24GB of RAM. But the system is not utlizing allowed RAM, as it should to increase the performance by resulting in lower time execution.