ERPNext Foundation ERPNext Cloud User Manual Blog Discuss Frappé* Donate

[FYI] Data import using LibreOffice version of .xlsx files


#1

I created an export file/template of data to import (in this case Sales Invoice to import legacy POS data).
I then added in the required data to the spreadsheet. When I tried to import it back, it took forever to get nowhere.
To debug/test it, I stripped it down to just 5 transactions, and imported it again (deleted the attachment and recreated it with new name to ensure no caching issues). It again took forever just to attach, and then got nowhere after some hours on the import.

Turns out there is a “funny” (unknown to me) in the conversion between Excel and LibreOffice’s .xlsx files. This is not an ERPNext problem, but I’m sure people on the ERPNext system often use the import feature, so I thought I’d share this.

During the data manipulation, I colour coded a few (entire) rows and a few (entire) columns to make it easier to see what was happening in mandatory/optional columns.
It seems (though I cannot confirm this) that between the .xlsx adjustments, Excel or LibreOffice Calc marks the furthest x & y points that contain any data - even if it is only a block colour/tag for visual clarity. This meant that the import file might just think you have a file 10 million rows x 20 thousand columns…which understandably takes quite a while to process.

I can’t say for certain what the best way to work around this is, but I did it this way…

  • Open the spreadsheet…
  • freeze the columns at about A10…
  • go to the last cell that has actual data in it…
    • hold down shift, and then click on the first data cell (A1)
    • this will mark the populated data neatly.
  • click the + to create a new sheet (eg. sheet2) within the file, and paste the data in that
  • delete the “infinitely large” sheet (since your data is now available in sheet2) and save the file

This can now be used as a more reliable import source.

Hope it helps someone.


Timeout error - generic
#2

I had similar problems with .xlsx files generrated from LibreOffice Calc (although I was NOT highlighting colors or any other additional attributes into the worksheet). I sitll noticed some difficulties I could not explain.

For me, I simply saved my files as .csv files and they worked just as fast as the Excel generated .xlsx files. Later I could also open those .csv files with LibreOffice Calc and re-save them as .xlsx and they would work just fine.

I always thought that maybe there was something wrong with the original .xlsx files that I downloaded as templates from ERPNext, but I never had trouble out of them in MS Excel.

I could never be sure what the issues was that caused them to import slow or sometimes fail or even timeout. For a long time I just made sure I always used .csv files to avoid the problems.

Thanks for the heads up notice on this as an issue. It sort of makes me feel better about my work-around practices.

BKM :grin:


#3

A short follow up…
After the kind assistance of BKM above and in some PM’s - the import is now working perfectly. The problem in importing is usually making sure that the right rows/data are in the right place.
The file I wanted to import contained ~6000 transactions. In MS .xlsx format it would not work. In Libre .xlsx format it was reading the data as infinitely large.
I had started to split the file into smaller chunks - as low as 200 rows - but it still took ~1 hour for a single file/subset.

As per the recommendation from BKM, I changed to CSV and the process is MUCH more streamlined. I tested with 3 files of ~2000 transactions each, and all worked fine. The uploads are very quick, and the processing is consistent and stable.

In summary, if you import/export data…here’s some of the lessons I learned

  • create an export template WITH data to ensure that you have all the right fields in all the right places
  • use CSV files rather than .xlsx files - you will save yourself a lot of trouble
    • date/time formats are critical, and if not set correctly, will cause a lot of grief too. Use a good text editor that supports regex to get the date/time formats into the required sequence
    • check that you CSV header is the latest one, or you’ll get messages telling you they’re wrong (I also didn’t believe these for several attempts, until I checked and saw that I had messed them up a bit in my pre-calc bits)
    • specifically, in my case, for readability, I changed from commas to semi-colons while creating the data files, and then forgot to change back. The process then import zero lines at blazing speed (since that’s what I told it to do :sunglasses:)
    • Read the screens and error logs - that’s what they’re there for