[Performance] 1 Sales Invoice Save makes ~700+ DB calls. ~1000+ DB calls on Submit

Have you joined the developer group on Telegram. Here is the link:

1 Like

I have often encountered the fact that i needed to call a get_doc only to call its instance methods. Often, these instance methods didn’t interact with any of the child tables, and these docs were loaded in hooks, hence, weren’t part of response.

Also, child tables can often be very large. Lazy loading of child tables can surely be a helpful addition to get_doc. Also, lazy loading should be a logical decision, instead of a configuration, which can be implemented by an additional argument called “lazy_load_children” which can be passed to get doc like

frappe.get_doc(doctype, docname, lazy_load_children=True)
4 Likes

Based on SP, I think we can optimize ERPNext in better way. Time may less then 1 sec. :grinning:

Nice analysis @Sachin_Mane
Good work @Sachin_Mane @rmehta

I worry also about the potential for increased RAM memory requirements of the host platform. Memory is the single largest contributing factor to server costs. So a system with 4 or 5 gunicorn users and 12 live users logged in and processing sales, there could be an increased memory requirement of close to 500mb? Does that sound right?

Memory usually comes in 1gb increments and costs an additional $6 to $10 per month on the server fees invoice.

Have I read this correctly? Is that a reasonable expectation of additional memory overhead?

BKM

If you don’t allocate more memory to redis cache, then that will just mean more cache misses and lead to performance similar to current implementations.

You don’t have to increase redis cache if you don’t want to - everything should still work the same as before (i believe - please correct me if I’m wrong).

The analogy is with the innodb_buffer_pool MariaDB variable you’ve most likely modified in other implementations. You can have the value small, but more memory means better performance. It’ll be the same for this. Finding the recommended values will be the key - and everyone’s value proposition will be different. But until this is tested and someone reports the results, we won’t really know recommendations.

3 Likes

Lazy loading is good idea. However it may complicate the caching approach. With document cache enabled, lazy loading wont be needed.

Memory is cheap nowadays. You can choose your server configuration based on your current memory utilisation.

It is a tradeoff and as @felix mentioned, it is upto you to decide how much memory should be allocated.

1 Like

get_cached_doc and get_cached_value won’t be implementing lazy loading param.

We use get cached doc for master data which is read multiple times, while lazy loading will be for documents which are read only once, and when reading its child tables is not necessary. That way, we can still use instance methods(Without having to manually create instance objects of document classes) without loading child table values (which can be in hundreds in some cases)

3 Likes

Makes sense to me.

A design question: currently when get_cached_doc and get_cached_value are called with a bad key they return a DoesNotExist Error rather than None. Is there a reason it works this way? I would think you’d want the two methods to have symmetrical APIs.

Good point. get_cached_value should return None, while get_cached_doc only should return DoesNotExist Error.

In [2]: x = frappe.get_value("Item", "nope", "item_code")
In [3]: print(x)
None
In [4]: x = frappe.get_cached_value("Item", "nope", "item_code")
---------------------------------------------------------------------------
DoesNotExistError

get_cached_value and get_value do not match.

In [5]: frappe.get_doc("Item", "nope")
---------------------------------------------------------------------------
DoesNotExistError

In [6]: frappe.get_cached_doc("Item", "nope")
---------------------------------------------------------------------------
DoesNotExistError  

get_doc performs consistently, though I still think returning None is probably better than DoesNotExistError

1 Like

Well to me the convention is not clear? None is the general nil or no value exception case, whereas DoesNotExistError is thrown when a doctype or an aspect of it fails to load_from_db

Maybe this is too specific a use case, but here’s how I’m regularly checking for an existing document before creating a new one. To me, the first option is more pythonic and more efficient.

x = frappe.get_doc("Item", nope")
if x:
    return x
else:
    frappe.new_doc("Item")
    # set item attributes

As opposed to:

try: 
    x = frappe.get_doc("Item", nope")
except:   # or except DoesNotExistError:
   x = None
finally:
     if x:
         frappe.new_doc("Item")
         # set item attributes
return x
2 Likes

yes to me the 1st is not ambiguous and preferred

The 2nd use case I question since new_doc executes regardless ie even when Item already exists?

Remember that finally is a ‘clean up’ action…

see 8. Errors and Exceptions — Python 2.7.18 documentation

“'Since the finally clause is always executed before leaving the try statement, whether an exception has occurred or not”

Yeah, my mistake. I have edited my post so that it’s correct. You still have to check for x. If you just put in the except portion it might catch the does not exist error but you still want it to catch other validation errors, so I think finally is better spot for it, to ensure it is outside the try/except pattern.

get_cached_value is returning DoesNotExistError because internally it calls get_cached_doc()

I think this should be handled in get_cahced_value to make the api same as get_value

@Sachin_Mane
Currently if sales invoice has 51 items, when saving the invoice, there will be 51 DB calls(insert) on the tabSales Invoice Item child table, because in the parent doc’s insert method, one db_insert is called for each child table record. the extracted code as below

	# children
	for d in self.get_all_children():
		d.db_insert()

the db_insert method only handles one record, but actually the insert into clause supports inserting multi records in one call like below

        insert into table (columns) values(record1), (record2), ...

based on the above analysis, I propose to copy and adapt the db_insert method to support handling multi records like below: model/base_document.py

def db_insert_multi(self, docs):
	"""INSERT multi documents (with valid columns) in the database."""
	def insert_multi():
		try:
			columns = doc_dict_for_insert[0].keys()
			sql = """insert into `tab{doctype}`
					({columns}) values """.format(
					doctype=doctype,
					columns=", ".join(["`" + c + "`" for c in columns]))
			placeholders = ''
			values =[]
			for d in doc_dict_for_insert:
				rec_placeholder = '(%s)' %(", ".join(["%s"] * len(columns)))
				if placeholders :
					placeholders = '%s,%s' %(placeholders , rec_placeholder)
				else:
					placeholders = rec_placeholder
				values.extend(list(d.values()))
			sql = '%s %s' %(sql, placeholders )
			frappe.db.sql(sql, values)
		except Exception as e:
			raise
		for doc in doc_obj_for_insert:
			doc.set("__islocal", False)
	if not docs:
		return
	doctype = docs[0].doctype
	doc_dict_for_insert = []
	doc_obj_for_insert = []
	count = 0
	for doc in docs:
		count += 1
		if not doc.name:
		# name will be set by document class in most cases
			set_new_name(doc)
		if not doc.creation:
			doc.creation = doc.modified = now()
			doc.created_by = doc.modifield_by = frappe.session.user

		d = self.get_valid_dict(doc=doc)
		if doc.doctype != doctype or count > 500:
			insert_multi()
			doc_dict_for_insert = []
			doc_obj_for_insert = []
			count = 0
		doctype= doc.doctype
		doc_dict_for_insert.append(d)
		doc_obj_for_insert.append(doc)
	insert_multi()

the related method needed to be adapted also

def get_valid_dict(self, sanitize=True, convert_dates_to_str=False, doc = None):
	doc = doc or self
	d = frappe._dict()
	for fieldname in doc.meta.get_valid_columns():
		d[fieldname] = doc.get(fieldname)

		# if no need for sanitization and value is None, continue
		if not sanitize and d[fieldname] is None:
			continue

		df = doc.meta.get_field(fieldname)
		if df:
			if df.fieldtype=="Check":
				if d[fieldname]==None:
					d[fieldname] = 0

				elif (not isinstance(d[fieldname], int) or d[fieldname] > 1):
					d[fieldname] = 1 if cint(d[fieldname]) else 0

			elif df.fieldtype=="Int" and not isinstance(d[fieldname], int):
				d[fieldname] = cint(d[fieldname])

			elif df.fieldtype in ("Currency", "Float", "Percent") and not isinstance(d[fieldname], float):
				d[fieldname] = flt(d[fieldname])

			elif df.fieldtype in ("Datetime", "Date", "Time") and d[fieldname]=="":
				d[fieldname] = None

			elif df.get("unique") and cstr(d[fieldname]).strip()=="":
				# unique empty field should be set to None
				d[fieldname] = None

			if isinstance(d[fieldname], list) and df.fieldtype != 'Table':
				frappe.throw(_('Value for {0} cannot be a list').format(_(df.label)))

			if convert_dates_to_str and isinstance(d[fieldname], (datetime.datetime, datetime.time, datetime.timedelta)):
				d[fieldname] = str(d[fieldname])

	return d

finally the call to insert child table records /model/document.py needs to be changed as following

def insert(self, ignore_permissions=None, ignore_links=None, ignore_if_duplicate=False, ignore_mandatory=None):

	# children
	#for d in self.get_all_children():
	#	d.db_insert()

	self.db_insert_multi(self.get_all_children())

this change will reduce the DB calls significantly for docs with a lot of child items, hence improve the performance.

Any Idea?

2 Likes

Lets start a new thread for this. This thread has gone too far. Closing.