Functions executed in background often gets stale document from frappe.get_doc/reload

spoojary · September 23, 2017, 3:00pm

I have function which processes the items in the child table in background. While the user keeps appending new items to the child table, the background task processes them in the background. This needs some kind of synchronization which I have struggled with(Using locks in ERPNext for background tasks) earlier, but managed to get them update the child table one at a time using class member locks. However, looking closely, I have observed that frappe.get_doc() followed by reload, doesn’t give the latest object from database. This is quite problematic as saving the object in background task will wipe out the data added by user. Trying to understand why this is happening, I came across a thread which says ERPNext uses separate redis instances for background and active threads. I don’t know much about the redis instances yet, but this looks like a critical flaw which could result in data loss.

spoojary · October 3, 2017, 4:09pm

@rmehta, Sorry for the explicit tagging. If my understanding is right, this could be quite problematic in my opinion, so wanted to get some quick expert opinion.

rmehta · October 4, 2017, 3:33am

@spoojary I think you are doing it wrong. Trying to edit the same document in both foreground and background is the cause for the problem. Maybe spin off the child table as a separate table (not linked to the parent) so you minimize “collisions”

spoojary · October 4, 2017, 4:28am

well, I don’t think that is going to work. The reason for having background tasks is to run long running tasks in the background. If you spin off the child table, you will lose the ability to send the information back to foreground table, like checking the status of the background task. Perhaps we could achieve that by changing the design, but it is unlikely to work for all the cases.
The standard way to resolve this is by making use of locks(mutex) which allows only one task to edit the shared resource. However that doesn’t help much because the background tasks have a completely different view of the objects handed off to them(stale copy). Is there any specific reason why background tasks don’t share the same redis instance? The documentation doesn’t talk anything about this. Even if they, I am fairly certain this design issue is going to make things more complicated for development in future.

rmehta · October 4, 2017, 5:16am

You have to design systems that are conducive to parallel processing, there is no magic bullet AFAIK. I would still go with redesigning so each piece runs independently to reduce collisions.

spoojary · October 4, 2017, 5:24am

Fair enough, if you are saying ERPNext currently doesn’t support parallel processing and background processing shouldn’t modify documents that might be updated in foreground.

Using same redis instance for both background and foreground tasks wouldn’t have this problem. That is the reason I asked what are the issues associated with that. I don’t know enough about the architecture yet to propose a solution. Understanding the problem is perhaps a step in the right direction.

rmehta · October 4, 2017, 5:38am

Redis just manages the queue for background jobs.

By definition background jobs are asynchronous and should not be designed with shared memory!