Race condition failure when trying to use a doc immediately after submitting it

I have server-side code that processes an imported list of several thousand product items.

The process involves creating warehouse locations and then material receipt stock entries for each item.

It skips creating a warehouse location that already exists.

If I execute the code with stock entry creation enabled then it fails after the first successful stock entry, with:

frappe.exceptions.ValidationError:
  Warehouse Lethal Widget - LSSA is not linked to any account,
    please mention the account in the warehouse record
      or set default inventory account in company My Company.

On the other hand …

If, (after reverting the database), I execute the code with stock entry creation disabled then all the required warehouse locations get created correctly.

Then, if I enable stock entry creation and rerun the same script, all the required stock entries get created correctly.

Here’s the code:

def createStockEntryForLocation(item, numbers):
  LG("Create {} item stock entry for location :: {}.  ({})".format(len(numbers), item.legacy_code, NAME_COMPANY))
  
  whs_name = "{} - {}".format(item.legacy_code, ABBR_COMPANY)
  try:
    existing_warehouse = frappe.get_doc('Warehouse', whs_name)
    LG("Location '{}' already exists".format(existing_warehouse.name))
  except:
    new_warehouse = frappe.get_doc({
      'doctype': 'Warehouse',
      'warehouse_name': item.legacy_code,
      'parent_warehouse': "{} - {}".format('Illegal Widgets', ABBR_COMPANY),
      'company': NAME_COMPANY,
      'account': "{} - {}".format('1.1.5.06 - Contraband', ABBR_COMPANY),
      'warehouse_type': 'Illicit'
    })

    new_warehouse.save()
    new_warehouse.submit()

  aWarehouse = frappe.get_doc('Warehouse', whs_name)
  aWarehouse.reload()
  LG("Have location. Creating '{}' item Stock Entry for {}".format(numbers, aWarehouse.name))
  if True:
    createStockEntry(frappe._dict({
      'serial_numbers': numbers,
      'warehouse_name': aWarehouse.name
    }))

I’m guessing that scheduler tasks need to be run when a new warehouse is created, and so it’s obligatory to delay before creating any dependent artifacts.

Is that how it works?

What’s the best way to work with this?

Hi Martin,

I’ve examined your code, and the ERPNext code here:

  • ../mybench/apps/erpnext/erpnext/controllers/stock_controller.py
  • Starting at line 60

I don’t “think” any background, worker scheduling is happening. Nor do I see any asynchronous code that would likely cause a race condition.

It’s odd that your function always fails during the 2nd iteration. Feels like something isn’t being cleared or reset. I have not noticed a fault with what you’ve written, though.

Were it me, I would proceed like this:

  1. Open the file stock_controller.py and edit the get_gl_entries() function.
  2. Sprinkle some print(f"Debug: '{variable_name}'") statements in a few different places.
  3. Re-run your code, and examine the output in console.
  4. Try to understand why the Array ‘warehouse_with_no_account’ is populated during the code processing.

My guess is this will illuminate the root cause and culprit.

Brian

I am always cheered to see a reply from you, and very grateful for your sincere efforts to get me over the humps I seem to keep running into.

So, I have yet to follow your suggestion, since I decided the small reduction in execution time of doing all the work in a single iteration rather than two just was not worth the effort.

On the other hand, it worries me a lot that there’s something going on that I can’t understand.

Pursuing a different task, I turned up a possibly related phenomena, that I also don’t understand.

Since, I run the Material Receipt Stock Entry creation script enqueued, I wanted a progress report. I wrote a bash shell script with curl that polls a whitelisted function I created. The function simply grabs a count of items SELECT count(*) FROM `tabStock Entry` and writes it to its own log file, and the bash script simply checks the latest value with tail -n 1 count.log.

Shocker – the row count is 0 right up until the enqueued job terminates then suddenly it shows 489 and exits.

So, clearly there’s an issue with MariaDb commit mode, which I assume also interferes with stock entries unable to find warehouses created moments before, as I described in the original question.

Is frappe.db.commit() intended for use only when used with other frappe.db.* methods?

If so, is there an equivalent “flush to permanent store” command for the frappe.get_doc group of methods?

You saw I did the following, right?

  try:
    existing_warehouse = frappe.get_doc('Warehouse', whs_name)
  except:
    new_warehouse = frappe.get_doc({
      :       :       :       :       : 
    new_warehouse.save()
    new_warehouse.submit()

  aWarehouse = frappe.get_doc('Warehouse', whs_name)
  aWarehouse.reload()

If save(), submit() and reload() don’t force a MariaDb commit, what’s going on?

Just retested…

… and yes

frappe.db.commit() is valid for DocType machinations, not just for those done with frappe.db.*

The commit() function definitely stands alone. It can be called anytime, anywhere. And it will send a “commit” statement to MariaDB.

image

Where things get strange is when we consider this:

  • How do we know which connection/transaction is that commit is applied to?
  • Can the “commit” apply midway through another operation’s series of CRUD commands?

For scheduled, background jobs to be successful, they absolutely must adhere to certain rules:

  1. Never be impacted by ‘commits’ thrown by other processes.

  2. Assume nothing about the “state” of MariaDB.

    For example, assume the purpose of a job is to Create records. It should first check if such a record already exists (because a user might have created this record, before the job fired). If a record does exist, the job must decide whether to 1. Fail or 2. Quietly catch the exception, and continue to the next line of code. Or perhaps even to try CRUD again, but with a different primary key.

    The solution will vary, depending on the job.

I have not-yet done a deep dive into all the pitfalls mentioned above…
…but your example worries me.

I’d like to do some experiments, try to replicate your issue, and document some of this.

Are you saying that there is no isolation between different user’s sessions?

Like, all partial work by all users will be committed if any user calls frappe.db.commit() ??

No. I think it’s more-likely there is some isolation.
But also, sometimes not.

We should do more research and testing, to identify the specific scenarios.

In your except clause, you’re not doing an .insert()

Save and submit are for existing db entries: Document API

I believe you are correct.

It may be a week before I can get back to that code to confirm, but I shall let you know.