Experienced Devs

3959 readers

10 users here now

A community for discussion amongst professional software developers.

Posts should be relevant to those well into their careers.

For those looking to break into the industry, are hustling for their first job, or have just started their career and are looking for advice, check out:

Logo base by Delapouite under CC BY 3.0 with modifications to add a gradient

founded 1 year ago

MODERATORS

snowe@programming.dev

jmk1ng@programming.dev

drewsiferr@programming.dev

How much flakiness do you tolerate in end to end tests? (programming.dev)

submitted 1 year ago* (last edited 1 year ago) by kersplort@programming.dev to c/experienced_devs@programming.dev

22 comments fedilink hide all child comments

End to end and smoke tests give a really valuable angle on what the app is doing and can warn you about failures before they happen. However, because they're working with a live app and a live database over a live network, they can introduce a lot of flakiness. Beyond just changes to the app, different data in the environment or other issues can cause a smoke test failure.

How do you handle the inherent flakiness of testing against a live app?

When do you run smokes? On every phoenix branch? Pre-prod? Prod only?

Who fixes the issues that the smokes find?

you are viewing a single comment's thread
view the rest of the comments

[–] Pantoffel@feddit.de 3 points 1 year ago* (last edited 1 year ago) (10 children)

I think/hope that the wording you used was a mistake.

End to end tests do not introduce flakiness, but uncover it.

Whenever we discover flakiness, we try to fix it immediately. When there is no time for the fix (which is more than often the case) we create a ticket that vanishes in the backlog.

For a long time the company I currently work at didn't have end to end tests save unit tests for a lot of their code.

Through a push of newcomers we finally managed to add end to end tests to many more parts of the code. However, these are still not properly documented. Some end to end tests overlap and some only cover a small part of one larger functionality. That is why we often find bugs that were introduced by us, because we had no end to end tests covering those parts.

We used to run end end tests only every night on the whole product. They usually take an hour or more to complete. This takes too long to run them before each merge. However, we have them organized enough such that for sub-product A we can run the sub-product A end to end tests only before each merge where we assume that we did only touch code affecting sub-product A. In case the code changes affected some other parts of the product, the nightly tests help us out. We are doing this in my team for a long while now. But we just recently started to establish this procedure in the other teams of the company, too.

[–] kersplort@programming.dev 2 points 1 year ago (6 children)

My experience with E2E testing is that the tools and methods necessary to test a complex app are flaky. Waits, checks for text or selectors and custom form field navigation all need careful balancing to make the test effective. On top of this, there is frequently a sequentiality to E2E tests that causes these failures to multiply in frequency, as you're at the mercy of not just the worst test, but the product of every test in sequence.

I agree that the tests cause less flakiness in the app itself, but I have found smokes inherently flaky in a way that unit and integration tests are not.

[–] minorninth@lemmy.world 1 points 1 year ago (2 children)

I'm a fan of randomizing the test order. That helps catch ordering issues early.

Also, it's usually valuable to have E2E tests all be as completely independent as possible so it's impossible for one to affect another. Have each one spin up the whole system, even though it takes longer. Use more parallelism, use dozens of VMs each running a fraction of the tests rather than trying to get the sequential time down.

[–] petenu@feddit.uk 1 points 1 year ago* (last edited 1 year ago)

The problem with randomising the test order is that it compromises the reproducibility of results. If there are ordering issues, then your tests will sometimes fail and sometimes pass, but will developers look at that and think "ah there must be an ordering issue" or will they think "damn these flaky tests, guess I'd better rerun the pipeline"?

load more comments (1 replies)

load more comments (4 replies)

load more comments (7 replies)