Should the testing framework be re-designed?

vrms · May 2, 2019, 8:32am

Continuing the discussion from What happened to the good old reports?:

… I’d like to propose a complete re-design of the testing framework, as it is not really usable from a testing perspective as mentioned https://github.com/frappe/frappe/issues/7126 and https://github.com/frappe/frappe/issues/7370

I’d like to propose moving to pytest as well, as it has so many extra features, is backwards compatible with unittest, and separating it out into a separate folder structure, so that the tests and the src code is not intermingled.

That way it can be built up side by side with the current testing, and not actually affect it. Moving to a XP/TDD ethos will really increase the quality of the code, as you mentioned, and a worthy task I think for the future of erpnext/frappe to really increase its uptake. I’m currently stuck with an ERP from a close sourced vendor which has exactly the same problems, because testing was not a first class citizen.

just a question from someone with ~~almost~~ 0 expertize in software testing …

why (as your issues imply) is it not good that test are running on test sites, where consequently the problem of leftovers in the db you are raising would not be relevant @cjpit ?

Is there any advantage of running tests on non-test instances? Or is there anything else beyond your proposal which I am just not getting?

cjpit · May 3, 2019, 2:33pm

@vrms Here is a good intro to why you would want to test and the types of tests.

First of all I know the frappe team have worked so hard on the system and, it is awesome. But my personal opinion, is that the thing it’s lacking at the moment, is quality. I’ve seen in the last month that I have been involved so far that the CI pipeline has broken, code has been commited, that then breaks tests, code commited that breaks things for all the users eg quotes etc this week on v11. This does indicate that there is a fundamental issue with the testing, since test cases, unit tests, integration tests etc should all ensure that these type of things don’t happen.

In regards to the frappe framework, it’s not so much running on a test site, or any site - its the fact that a test that fails, it leaves a whole lot of junk data in the database. This then means the next run of the tests you try and run always fail, because data already exists in the database , and it errors out on names already existing etc. - Then you are getting false failures on tests, that don’t have anything to do with the test failing for a reason of a bug etc.

This means the unit test itself has dependencies - a test is dependent, on another test working, for it to have the right inputs to test. Unit tests aren’t meant to be designed in that way. As a developer I actually run tests constantly while writing the code, and failing tests are part of building code.

That’s my main purpose for proposing some genuine discussion on how the tests have been designed, but unfortunately there hasn’t been much feedback so far.

Some other feedback,
Tests are also a form of living documentation, as they explain how the function or code was intended to be used. This is especially important to get other developers involved, and they know they can trust the code.

At the moment, the tests seem to be pretty weak.

For example I wanted to understand the Project and Task code. The first place I went was to have a look at the tests. I scratched my head for a while trying to understand tasks and “depends on” task. How is that different to “Parent Task” for example, aren’t they really the same thing?

I could see from the one test that was there, that a task could depend on another task. However, when you use the actual gui , and you use the tree view, the “depends on” field actually ends up getting used the opposite way. It shows the children that are dependant on the task your looking at. Since there was only one test, which
was correct? The spelling of the field “Depends on” doesn’t match what the system does, and the test doesn’t match the functionality.

So just giving some honest feedback, I think it needs some help.

I had started a separate project to do that, using pytest. It has a lot of great plugins/options/ease of writing tests etc.

I’ve just done a set of tests for the refactoring of the ldap functionality as an example. Writing tests does take discipline, for example I wrote more test code than I did for the ldap functionality, but, it makes you think about how you design the code, and can make things better, understanding what it does and that will only help any other person that comes along and tries to understand what I was attempting.

Does anyone else have any ideas?

max_morais_dmm · May 3, 2019, 4:27pm

My 2 cents here!

I’m the kind of guy that don’t believe that Unit Tests works, for that kind of software we run, like ERPNext.
My major argument is:

Unit tests runs on server, users really on browser to access the functionaly of ERPNext, so, Unit Tests can have passed, we can have 100% of coverage of the code, and a missing bracket in the JavaScript, can turn the system unacessible to anybody.

I believe that the realistical solution are integration tests, where we develop tests with the mentality of what the user plan to do, not what the code is supposed to do, also the environment of the tests, should relly on the browser.

Martin Fowler, have a nice discussion about integration tests.

Another, pro about integration tests, is that, we can ensure that an Sales Order is working, also we can ensure that a sales invoice is working, but without integration tests we cant ensure that they togheter wokrs, like in one bug I have seen here

Another point that we need to keep in mind is that.

Tests aren’t made for the developer
Tests aren’t made for ensure that the code works
Tests are made to ensure that the users are fully capable of really into the code you wrote to do what they want to do.

I already started in the past some experiments in that area, like here:

gist.github.com

https://gist.github.com/MaxMorais/7d8e8154abf576930331

0-README.md

# First how it works

[![ERPNext automated Setup](http://share.gifyoutube.com/m66j7g.gif)](https://youtu.be/JQh3Lgzr5UU)

# What is Salad

Salad, is test automation tool, designed on top of [Lettuce](http://lettuce.it), to do tests using files that contains the description of each task, these files are named `features`

## Setuping your environment

This file has been truncated. show original

00-erpnext-setup.feature

Feature: Ensuring that the instance works, and the UI loads fine
In order to make sure that ERPNext works
As a First Login and Setup of an User
I open the Instance URL using salad

Scenario: First Login
    Given I am logged in
    When I look around
    Then I should see "ERPNext" anywhere in the page

This file has been truncated. show original

steps.py

# -*- coding: utf-8 -*-
from lettuce import step, world

@step(r"go to the home page")
def go_to_home(step):
    step.given('I visit the url "http://localhost:8000"')

@step(u"go the the desk")
def go_to_desk(step):
    step.given('I visit the url "http://localhost:8000/desk"')

This file has been truncated. show original

and here

Integration test, can be described in a funcional maner, that allow users to describe what they want to receive, and us as developers we can do our work to ensure that descriptions will pass.

But for some initiative of redesign of testing framework be sucessfull, it’s not one engagement of us, developers, and project mainteiners, it need engagement of the whole community, but fundamentally users, because they are the target of any work we do.

cjpit · May 6, 2019, 8:09am

Thank you @max_morais_dmm for your comments, but I think you missed my point - I never said integration tests were not needed, I actually mentioned that integration tests are needed.

Also, Integration tests fall down, when you look at the changes for example done with changing the gui for version 12. They would need to have a complete rewrite.

However, if there were unit tests on all of the underlying api’s/docs/ their functionality (And there is definately some amount of tests in there, I just think that they need fleshing out more, and concentration upon, so that they actually bring value to the product) could be tested, and shown that oh, in the version 12 gui tests, it doesnt pass the sales order tests. However, you know, because the unit tests underneath, that the problem lies in the gui, nowhere else.

My main point of this discussion is how to build quality into the code, and allowing the system to be able to handle changes more easily.

Another point that we need to keep in mind is that.
Tests aren’t made for the developer
Tests aren’t made for ensure that the code works
Tests are made to ensure that the users are fully capable of really into the code you wrote to do what they want to do.

I disagree with all of the statements above, - As I explained in my previous post, but here is a good link to also show the advantages of unit testing, which is the opposite of everything you have mentioned here.

The results are that the end user gets a much more stable product. It is the responsibility of the developer, to either keep a clean house, and spend extra time, in writing unit tests, or they don’t, which then results in a fragile system that breaks when things change all the time, and then the user is the one that has to find the bugs, report it, get frustrated, wait for a fix etc.
All of that could have possibly be found, and fixed, early on when the code was being written. It’s really about the investment of time, and if it’s put early on in the development phase, it results in that time not being needed later on, which usually results in a lot more investment, to fix it.

I already started in the past some experiments in that area, like here:

Yeah that is similar to cypress/selenium - which frappe have started to use (Cypress tests that is), so they are already are on board with integration testing.

But for some initiative of redesign of testing framework be sucessfull, it’s not one engagement of us, developers, and project mainteiners, it need engagement of the whole community, but fundamentally users, because they are the target of any work we do.

I disagree strongly with this statement as well - It doesn’t fall upon the user at all. If I build you a house, but i leave bits of building materials lying around inside, a window that doesn’t close, and a hole in the roof - Who’s fault is it? The home owner just expects to be able to use their home as a normal person would. The builder did not finish their job and complete the job properly. Its not up to the home owner to even know how the building was built, or how they should fix the hole, they just know that a hole is there, which means they can’t live in the house.
I think of testing the same, it’s basically a duty of the developer to ensure what is done is of a high standard, for the future, and for the present.

Going back to the reason for this discussion, has anyone used pytest before? I’ve used it quite a lot for the last couple of years, and enjoy using it. The problem of database records being left in the database after a test are solved elegantly, with the below:

in conftest.py :

@pytest.fixture(scope='session')
def db_instance():
    # monkey patch the commit, so that records do not get saved to the database.
    frappe.connect(site=os.environ.get("site"))
    frappe.db.commit = MagicMock()
    yield frappe.db


@pytest.fixture()
def db_transaction(db_instance):
    # create a new transaction.
    db_instance.begin()
    yield db_instance
    db_instance.rollback()

Then I can just use this “db_transaction” parameter in whatever test I wish to write, and I have access to the db, and anything I do in that test will be reverted. As an example below, I test to ensure that there is a placeholder set in a field, and if it isn’t it should throw the specific error to the user.

def test_validation_ldap_search_string_needs_to_contain_a_placeholder_for_format(
db_transaction):
    with pytest.raises(ValidationError,
                       match="LDAP Search String needs to end with a placeholder, eg sAMAccountName={0}"):
        settings = get_doc("LDAP Settings")
        set_dummy_ldap_settings(settings)
        settings.ldap_search_string = 'notvalid'
        settings.save()

This means that for a whole test session that is run, the commiting to database is mocked out and will never be run. However for each test, you can create a new database transaction, and do your test. This means you can add records, and check them that they exist, were updated etc, and once the test has finished, all of those database changes will be rolled back. It does that using a form of dependency injection, which is one of the awesome features of pytest.

This is just one of the ways I was thinking it would be good to re-think and redesign the testing framework.

clarkej · May 6, 2019, 6:28pm

I appreciate and agree with your thoughts and concerns cjpit and max_morais_dmm thanks

Probably max_morais_dmm you refer to ‘business use case’ tests - that capture end user scenario context requirements?

My rough thoughts are -

Initially to start, users will assume and grant quality to an extent. But to sustain ERPNext growth and favour, such control can’t be left to chance; due care must be paid to ensure quality and stability, and that is for the user community at large to recognize and support test driven process practice discipline.

A PR submission requires a validation test Pull Request Checklist · frappe/erpnext Wiki · GitHub

The appropriate tests can detect and define breaking changes, to point to unforeseen impacts of a developer’s changes or a refactoring that violates a public API.

For an urgent hot fix, a test to reproduce and mark the error would help learners to see for themselves otherwise forgotten problem cases.

At release time, regression tests are especially valuable and efficient to identify and isolate problems with proposed fixes.

To promote quality among users, one good way is to report test results and test coverage Help with Writing Tests - Foundation and community invite to help with 'do nothing' pass tests - #2 by revant_one

rbarreca · July 27, 2020, 9:01pm

Hi all, I’m not sure if this is off-topic for this thread but I wanted to chime in here instead of adding a new post.

I’m a software developer evaluating ERPNext for an upcoming project. I’m poking around the codebase to get a sense for how stable and well-tested it is. So far I’m impressed with the project.

I see on a recent TravisCI build there were 800+ tests and a green build (although the code coverage badge says 0% erroneously). However, when reviewing many recent commits I don’t see many unit tests going in alongside bug fix commits. To me that’s concerning and makes me feel uneasy about investing more time into evaluating ERPNext.

If done well, tests can increase stability of course. But, more importantly IMHO, a code base that is testable is one that is well-designed (modular, single-responsibility principle, etc.) and less prone to accruing technical debt which is the bane of most projects like this.

I know this post is a bit old, but I’m curious to hear from those that contribute code to ERPNext if there is any action to adopt more test-driven development (TDD) practices. Is this still an identified area of improvement?

brian_pond · May 14, 2022, 7:12pm

First, I apologize for necro-ing this old thread. (I often cringe when I see this happen on the forums). However, a global search for the word “pytest” yielded only 3 results, ever. Two of them are this thread and a linked one. So, I thought the 4 of you might be interested.

Not so long ago, I felt the same way as Max: that Unit Tests didn’t “feel right” for ERPs. My experiences in the past 5 years have changed my opinion:

I started writing small, non-ERP programs. Many with formulas and results that could easily be validated with Unit Tests. I now have a broader and different perspective.
Interpreted programming languages like Python have burned me so many times. Linters simply aren’t sufficient. No matter how hard I try, bugs can make their way into Production.
ERPNext specifically has burned me. With an array of challenges like framework bugs, missing documentation, child vs. parent documents, API behavior vs. JS page behavior, generic “args” in functions, shared code (e.g. transactions.js), and dozens of others.

Today, I feel I must start developing Unit and Integration testing. Urgently.

^ This is exactly how I feel.

Last year, I tried using the built-in testing of Frappe and ERPNext. It did not go well. You see, coincidentally my Site name already started with the word “test”. Because it was a DEV/TEST environment. So when I typed bench run-tests, this happened:

My custom fields were destroyed by this code.
My Item Prices were destroyed by this code
As you all noted, a ton of “test” companies and data were created. But never rolled back, littering my environment.

These data changes were not immediately noticed. I discovered them by accident a day or two later, when all my Sales and Purchase Order prices where zero(0).

(imo, enabling ‘run-tests’ should probably be a flag in 'site_config.json', instead of making assumptions about Site names)

As some of you noted, 'pytest' is more robust than 'unittest'. I also want to perform tests on any pre-PROD environment, with automatic rollbacks. This saves time spent on making “fake” test environments using SQL restores.

Last night, frustrated again, I decided to just create my own App. Something that can display known tests. Run them together or individually. Leverage the power of pytest. Handle the rollbacks. Maintain test data in JSON files. Log the results to a DocType, so I can display them on a web page, etc.

As I continue working on this, I thought I should reply to this thread. Perhaps one of you has taken things a step further in the past 2-3 years. Or perhaps someone else on the forums wants to share their own experiences and ideas.

ankush · May 15, 2022, 6:48am

Few pointers:

We discard some existing data cause they can meddle with test results easily. (like item price which can affect a supposedly “unit” test.
Tests, because of the volume and dependencies they have will likely never run cleanly on a single site (such that site can be reused later)
Recommended development practice IMO is:
- create a separate app for custom developments
- Use FrappeTestCase (default now) which discards all state between multiple test cases. Test records from JSON and pre-test data is still committed to DB.
- create separate site for local test
- CI for isolated test runs (use github workflow: frappe/boilerplate.py at develop · frappe/frappe · GitHub )

Server-side tests are still insanely valuable for everyone involved:

Core-devs: They can ship changes confidently and write new ones relatively easily compared to e2e test which takes a significant amount of time to write, run and maintain.
Custom app devs: They can easily verify that whatever changes they’ve done can be tested in a new isolated environment. Tests can also verify that the app works with newer versions without manual testing.

brian_pond · May 15, 2022, 3:38pm

Thank you for the reply, @ankush. We disagree on many points. But I do appreciate the response.

I’d never heard of 'FrappeTestCase' until now. I did some investigation. It appears to be a very recent addition. This class was added to v13 around March 14th, immediately before the release of 13.23.0. So far, it is not documented online. (Unit Testing).

Outside of Frappe Technologies, I suspect few have heard of this.

What I Like:

This appears to be a step in the right direction!

What I Don’t Like:

It worries me that large changes, like this one, were quietly backported into v13 in a minor release.

When it comes to backporting new features, I feel that “opt out” should be the default behavior, instead of “opt in.”

This is the main reason my clients and I never use the 'bench update' command. I feel it’s too risky. Too many things are changed in-between Major Releases.

So, everywhere I work, the policy is to upgrade (manually) only 1-2 times per year. It takes us days (sometimes weeks) to review all the Frappe and ERPNext changes, and ensure they haven’t introduced unwanted behaviors. It’s hard work, but worth it. (We don’t have the time/staff to perform this process monthly or weekly. Nor do I see much of a benefit.)

Which of course, leads the conversation right back to Unit and Integration Tests.

ankush · May 16, 2022, 8:19am

It’s optional addition to testing. Doesn’t even affect anything in production This is in-line with semantic versioning.