Upload Huge Data

I need to upload a huge xml file (~200MB) to import data from another ERP Application. I do face problems such as browser crashing, python backend becoming unstable.

Javascript:

frappe.pages['my-data-import'].on_page_show = function(wrapper) {
    var $form = $("form[id='frmFileUp']");
    $("#btn_read").click(function() {
      var input = $('div').find("[type='file']").get(0);
  
      if(input.files.length) {
        input.filedata = { "files_data" : [] }; //Initialize as json array.

        window.file_reading = true;

        $.each(input.files, function(key, value) {
          setupReader(value, input);
        });

        window.file_reading = false;
      }
    });

    $("#btn_import").click(function() {
        var filedata = $('#select_files').prop('filedata');
        if (filedata) {
            frappe.call({
                method: "~~path~to~method~~",
                args: {
                    "filedata" : data,
                    "open_date": $('div').find('[name="exp_start_date"]').val()
                },
                callback: function(r) {
                    if (!r.exc) {
                        frappe.msgprint(__("Files uploaded"));
                    } else {
                        frappe.msgprint(__("Files not uploaded. <br /> " + r.exc));
                    }
                }
            });
        }
    });
};

function setupReader(file, input) {
    var name = file.name;
    var reader = new FileReader();
    reader.onload = function(e) {
        data = reader.result;
    }
    reader.readAsText(file);
}

Python Method:

@frappe.whitelist()
def read_uploaded_file(filedata=None,open_date=None):
	if not filedata:
		return

	params = json.loads(frappe.form_dict.get("params") or '{}')

	if params.get("open_date"):
		open_date = params.get("open_date")

	global opening_date
	opening_date = open_date

	try:
		xmltodict.parse(filedata, item_depth=5, item_callback=process)
	except ParsingInterrupted:
		frappe.db.rollback()
		return {"messages": ["There was a Problem Importing" + ": "], "error": True}

	frappe.db.commit()
	return {"messages": "Import Successful", "error": False}

Any way to achieve the above. I am not an expert on any of the above languages.

I think is better to write a server side script …aka custom python app …

1 Like

Upload Limitations

ERPNext restricts the amount of data you can upload in one file. Though the number may vary based on the type of data. It is usually safe to upload around 1000 rows of a table at one go. If the system will not accept, then you will see an error.

Why is this? Uploading a lot of data can cause your system to crash, specially if there are other users doing things in parallel. Hence ERPNext restricts the number of “writes” you can process in one request.

see the manual
Data Import Tool

Overwriting Data from Data Import Tool

so you should break your file in smaller chunks or if you have little experience you can write server side script to add to database directly

1 Like

@JoEz Thanks, I am already doing that.

@ganas I know there are upload restrictions and I have already gone through Data Import Tool and I don’t want to use it, rather want to process the XML at the backend and insert records.

This data import is a one time job, at the time of setup. Therefore Breaking data into chunks and uploading seems interesting, any idea How I can go about it.

I gave a shot to breaking the data in JS, but this would call a lot of frappe.call requests. Even if it calls a lot of requests how do we call them in a sequence ie. Call the second after completing First.

Well, I have somehow managed to upload and process the data in Batches(Sequentially). I am still having some trouble at the Back-end (I Guess), and hope that someone can shed some light over it.

I am having ~1,00,000 records in the XML file consisting of Customers, Sales Invoices, Payment Receipts, Stock Items, etc. When I start the process of Importing the XML file in Batches, after processing about 3000 records (and inserting about 500 of them, as there are some checks that needs to be passed before inserting records), the System stops responding.

Any suggestions.

Try to break the source files into multiple small files before loading using split or any other OS specific command that you can use.

After that you can then write some custom python code to loop through the smaller files and load them.

@Pawan I can’t understand why to split files …as far as i know that has to be done using erpnext import tool, but would be possible to write a custom app or custom python script in order to import big files …am I wrong?

What I have done is

  1. Read the file in javascript using readAsText()
  2. Split this text so obtained into tokens/batches (of about 1000 records a batch)
  3. Upload them one after the other (in a loop), to a custom python function using frappe.call for further processing.
  4. The function processes each batch and calls a db.commit() after each batch is successfully processed. (and ofcourse then javascript would upload another batch. the process goes on till the EOF)

Is this a correct approach ?

As long as your frappe.call is calling out to the same code of data import tool I think this approach should work.

If you are facing a bottleneck at your DB, try committing more often ,maybe every 100 rows and see if it helps.

Also on a different note, you can try to increase the max_file_size variable using your site_config.json file.

Thanks

It’s now working :smile:. Just had to reduce the no. of records per batch to 100.

Thanks for Support

1 Like

Good to know that…Hope you would share your code with others via Git so that they can learn from it

Thanks,
Pawan

1 Like

I think handling any data above 1 MB of text in JavaScript is a killer for a browser or simply anti-responsive approach.
I would upload it using regular Frappe API like (button inside ListView menu):

JavaScript

frappe.listview_settings['My Doctype'] = {
    onload: function(me){
        me.page.add_menu_item(__('Add records from file'), function(){
            dialog = frappe.ui.get_upload_dialog({
				"args": {
					"from_form": 1,
                                    "folder" : "home"
				},
                                callback: function(r) {
					frappe.call({
						method: "my_module.my_module.doctype.my_doctype.my_doctype.my_function",
						args: {
							file_name: r.name
						},
                                           });
                                }
            });
        });
    }
}

Python

@frappe.whitelist()
def my_function(file_name): #file_name is ID in File doctype - not the name of the uploaded file
message = {}
if file_name:

	file_doc = frappe.get_doc("File", file_name) 
	file_path = get_file_path(file_doc.file_name)
            # here we can open file and parse it
   else:
           message['error'] = "No file name given"

return message
1 Like

@Marcin_Walkow Thankyou for the inputs. yes it is not recommended and also a little difficult to upload huge files in frappe.

I used a rather “Direct” method to process the uploaded data.

Quote from one of my previous posts:

1.Read the file in javascript using readAsText()
2.Split this text so obtained into tokens/batches (of about 1000 records a batch)
3.Upload them one after the other (in a loop), to a custom python function using frappe.call for further processing.
4.The function processes each batch and calls a db.commit() after each batch is successfully processed. (and ofcourse then javascript would upload another batch. the process goes on till the EOF)

@mehta_akshay:
You didn’t get me - instead of splitting files in JavaScript (handling a 1 MB maybe a killer) I would try to upload 200 MB with a status bar. Of course you have to pay attention mostly to timeouts.

Maybe this will help?
You can upload csv files through the terminal using bench import-csv.

https://frappe.github.io/frappe/user/en/guides/data/import-large-csv-file

1 Like