Teach the AI to Unit Test

19 February 2026

The Gemini AI will make some pretty good guesses about how a 3rd-party API may work. It is good at searching the internet, but when APIs have changed across versions, the old and new docs and examples it’ll find can confuse it. In a dynamic language and environment you’ll not spot these errors until runtime.

To combat the ambiguity and to give the AI agent more power to solve its own problems, ask it to add some tests around the code that uses the API. (In my case, the API is the XTDB client API.) Once it has a way to execute the code through tests, it’ll quickly start figuring out where it’s made mistakes and start running its own experiments to observe errors, search for fixes, and applying those fixes around the codebase. I exhibit the same pattern when I’m doing it by hand.

The tests also give you, the human, an easier entry point to evaluate the code the AI generated. If the tests look gnarly, you know to suggest refactorings to improve the architecture and make it easier to test. When the AI has the tests passing, and the test code is easy enough to read, then you can have a closer look at the application code to refine and keep that maintainable too.


Browser Automation with Geb, Spock, and Groovy

22 October 2018

I recently gave a talk and demonstration, Browser Automation with Geb, Spock, and Groovy, at the Capital Area Software Engineers group in Harrisburg, PA. While explaining the whole stack of software, I showed how to:

  • Start a project in Gradle

  • Get the Geb and Webdriver dependencies in place

  • Get started in Spock testing framework

  • Start up a browser for testing

  • Interact with the page content

  • Wait for asynchronous content

  • Abstract away page components into Geb Page classes

The slides and all the code are available in my geb-preso repo. It includes copies of the code I had prepared, the code we wrote live as a group, and my little toy Planning Poker JS app I was testing.


Lifecycles of Test Data

10 November 2017

Managing data created by tests can be a bit of work. The environments and frameworks we allow ourselves can help us or sometimes hinder our efforts.

Complete Isolation

Building test data from scratch every time in isolated storage is best. That’s why unit testing with mocks and data in memory is so great, and why it’s worth the trouble to use a fresh new database for every test run. Grails helps us by using an in-memory H2 database.

Shared, But Immutable

If you must share test data, confirm its expected state before testing with it and NEVER modify it. Race conditions between threads changing data are bad, and races between developers or build servers changing data are especially bad as well, even with only 2 builds. It’s embarrassing and limiting to have a project that can be built by only one person (or build server) at a time, and having any more would cause failures.

Shared Randomized Data

For the tests that must modify data that is shared, create that data in a random fashion, and only make changes to the data created by the test. Try to clean it up at the end, but if you don’t that’s mostly OK, because you’ll never try to reuse it again. Regardless if you try to clean up or not, you’ll still need to write the tests to assume there’s unexpected data in case a cleanup failed, or another test suite is being run concurrently. This provides opportunity for the tests to be clearer about what they really want to say. Instead of saying, "There are 2 records, the first is 'Joe' and the second is 'Bob'", maybe the test should say, "There’s at least 1 record, I see the 'Bob64524' record I just added somewhere in the list, and all the other records meet my search criteria."

Extra time spent being careful about your test data pays back in the end when you’re trying to work fast, rely on your tests, and won’t have time to deal with undue sporadic failures.


All the Posts

February 2026

October 2018

November 2017