Tag Archives: Gource

On the Forgetting of Gource

Alright.  This is the very last time I figure out how to get gource producing visualisations of my development activities from scratch. Today’s post is me leaving notes on this for next time, so I can cut straight to the chase.

A couple of times now, I’ve had occasion to want to give the people I write software for some insight into what I’ve been doing. This project last completed was one such example. A lot of work was done under the hood to enable the code-base to have a replaceable user-interface, and possibly also spatial database. The user-interface had a few new features, but the lion’s-share of it ‘looks’ exactly the same as it was when I started.

I can guarantee you though, that the source looks absolutely nothing like how it started out. Most of my time was spent in taking a code-base “designed” to be a single-user desktop application, and turn it into something that would be relatively easy to make multi-user (it is, now) and optionally, web-enabled (one particular large data-set stops this from progressing just yet).

How then, do I convince the people paying me to tear up their code-base that even though the user-interface is the same, things are now radically different under the hood? Enter gource, and some discussion on how it visualises the git repository as the source-code evolves through the project.

Continue reading

Advice On Being a Research Software Developer

My current software development contract draws to a close. It’s the second in a row of appointments that involve writing software for different researchers, and the third such contract in my career. Allow me a moment, dear reader, to reflect on the experience, and attempt to describe what it’s like being a research software developer.

Of course, we only ever truly know an experience by slipping on the boots and walking the path, but by viewing it through the filter of my previous corporate life, perhaps the contrast with more typical environments might be enough to give you, interested reader, some idea of what to expect.

Before getting into the meat of it though, I’ve blogged about this contract a few times before. Let’s take a brief recap.

This contract saw the boss asking for a simple deployment, prompting my search for a single .NET assembly to house all others in a release. It had me making serious attempts to test with NUnit and Excel spreadsheets in a way that wouldn’t drive me mad. It had me rejig my underlying framework to better support mashups in terms of reading and writing simulation data with Excel spreadsheets and CSV files.

The project saw me cry tears of pain thanks to a 64-bit Windows upgrade badly misbehaving with my reliance on 32-bit libraries, both in terms of the Visual Studio debugger, and then NUnit. There was lots more fun that you’ll probably never learn about because I never thought blog to about it as it happened.

From the above blog posts, I confess that these posts do little to draw out the experience of research programming. These issues are just as likely to pop up in a corporate or government software development experience as they are in a role devoted to writing software for researchers.

All that is, for one very small point. That the blog posts exist at all. Which turns out to be a very big point. Let punctuate the point with its very own heading.

Expect Greater Degrees of Freedom and Fear

You’ll be working with researchers. The wonderful (and scary) thing about research is that you don’t really know where you’re going until you get there. Sure, you’ve got some ideas on what might be out there, but often, when you start, it’s not even remotely clear how to get from here to there.

Note that the structure you’re used to in corporate and government jobs probably doesn’t exist if you’re contributing to a science project. Unless it’s engineering flavoured, a researcher won’t have a clue what you’re on about if you attempt a deep discussion on requirements analysis, development methodology, testing frameworks, or other programming jargon.

The typical constraints that attempt to ensure quality are gone. Congratulations, you’re free. Free enough to hang yourself with all that extra rope. If you don’t start adding some of the more appropriate hard-earned skills back into the mix, it’s your funeral, cowboy/girl! You’re still being hired to produce quality, only now, you get to choose what that means.

Because researchers invest their time trying to learn something new, they actively grok and support the idea of trying new things, even if it may go nowhere:

In real terms then, yes, they’ll be relying on your expertise in programming (and dare I say it… computer science). However, more importantly, they’re counting on your capacity to jump in and just try stuff without necessarily knowing where it’s going, or even what you’re doing. Let’s ground this concept with a few examples:

  • The programming language used to construct a simulation prototype was inherently inefficient. A more efficient language with similar syntax to the original was chosen, allowing the researcher to retain an understanding of what the re-implementation does. The change in implementation was taken in the hopes that we’d get a goodly speed-up (Excel VB Script -> VB.NET) .
    • A factor of 10 speed up (50 seconds down to 5 for a typical run) was far, far bigger than I’d hoped for, and satisfied the boss in terms of retaining an understanding of the code-base. Did I know initially? Not with much certainty. Just an educated guess there’d be “some” speedup based on what I understood of the technology space.
  • Dead ends, and changes of mind that “sound” small, but would kick the wind out of all the assumptions that were holding my code-base together happened pretty regularly. The worst for me was a need to let a simulation “run” stretch over more than one year. The entire code-base worked on the assumption that a run and a year were synonymous. Frequent backtracks to get a good answer to this were the order of the day.
    • Know that if you fall in love with the code above the end-goal, you’re going to have a very bad time of it.
    • Pick a source-code control technology that actively supports (or at the very least, minimises interference with ) your capacity to try something new, and blow it all away if it turns out to be a flop. For me, GIT is a gift from the coding-gods in terms of how easy it is to branch experiments off, merge the winners and torch the losers.
  • Simulating how species benefit from releases of water into a river connected to wetland rich in wildlife gets complex fast. The day it occurred to me that grokking simulated annealing is the very shallow end of the pond was a very sobering day indeed. The day my boss made it clear that I’d be expected to invent new approaches to the scarier-end of the code-base was very, very sobering.
    • For my computer-science friends, simulated annealing is a novel way of finding a “good enough” answer to the 0/1 knapsack problem without the exponential effort of a brute-force search for the best answer.
    • Don’t be afraid to engage in any and every activity you can think of to understand what’s there and what you can do with it. Re-implement it. Draw funky diagrams, be they mind-maps or UML. Doesn’t matter, so long as you’re continually leaning into the complexity (see later) until the penny finally drops.
  • Things occasionally get very sticky. I write down what I did to fix it in enough detail so if it ever happens again, I can redo it without re-investing the effort I spent initially.So long as I stay away the very hard boundary of discussing results before we have a publication, I’m free to do what I want with those notes. Hence the work blog entries.

Fear and Freedom, sitting in a tree. K-I-S-S-I-N-G! Yes, they’re into threesomes, but know that YOU are the optional party in any ménage à trois that eventuates. All I can offer you is some recycled wisdom:

Failing isn’t in the falling down, it’s in the staying down.

Lean Into the Complexity; Fear Doesn’t Banish It

If it’s really research, it’s cutting edge. If it’s in a domain we’ve been looking at for a while, it’s also guaranteed to be involved. There’s complexity here that you are unlikely to find in other software roles.

For me, that complexity is where my inner-critic starts up his magic “That’s it! THIS time you are going to choke!” chorus. It helps to enjoy this kind of fear, and recognise that this is you standing at the edge of what you’ve tested yourself against. If shouting “BOOGEY-WOOGEY” at the complexity-monster doesn’t see it bat even a single eye-stalk, how do we come to understand it?

Work with what you’ve got right now, and start yesterday. Research papers, prototypes, whatever you have on-hand. A key aim here is to build a vocabulary quickly on the research domain so you can start having meaningful exchanges with the expert(s) as soon as possible.

What I like doing right off the bat is pulling out the nearest mind-mapping tool (Freeplane is my current favourite) and going to town on whatever reading material I’ve been handed, pulling out what seems most important into a web of terms I can begin hanging my new knowledge on.

A Mind Map of the 50,000ft view of the project.

A Mind Map of the 50,000ft view of the project.

I also like starting a degree of simple refactoring work on the code. It’s a good excuse to get used to the code, and if it was cooked up by someone who doesn’t consider programming their primary passion, I can guarantee you’ll have a rich field of refactoring potential.

In my case, I had nearly all of the functionality sitting in a single method call, 657 lines long. In terms of “Bad Code Smells“, we’ve got ample examples of “Long Method”, “Duplicate Code”, “Dead Code”. Lots of simple refactoring wins just sitting there, waiting to be knocked off as you work on groking what’s been written.

Exactly what you do to lean into the complexity comes down to a matter of personal preference. Abstracting away from any particular activity, I’m really engaged in activities of remembering by “doing” something with the material. Just reading new, highly complex material doesn’t help my retaining of the knowledge.

It’s Agile, And Test-Driven, And Telling Doesn’t Help

You won’t be handed a pretty design document full of informative UML diagrams that allow you to chunk your understanding, or a cross-referenced requirements specification clearly identifying atomic, testable, unambiguous requirements. Nobody’s going to list acceptance criteria, that once implemented, guarantee that you’ve done the right thing.

I’m adamant now that when it comes to picking an appropriate development methodology for these kinds of projects, you’re facing something that needs to be very agile. But please, don’t take my word for it, read a fantastic discourse on the subject of choosing an appropriate development methodology for the nature of the work.

Also… good luck selling your researcher(s) on daily SCRUM meetings, the necessity of TDD, continuous integration, pair programming, or whatever else you’re standing on your religious pulpit about.

Instead, consider leaning into regular coffee catch-ups, where you can air issues before they become obstacles. Mention that you built some tests to make sure you didn’t break anything important with the latest experiment. Tell them you’ll be ready to hand them something that runs (not necessarily behaving though) whenever they ask for it. Ask to sit with them and walk through the code together when it looks appropriate.

EFlows Class Diagram

EFlows Class Diagram

Get the point? You’re still doing the entire agile “thing”, but you’ve just gone meta! Pull out your Supers-cape emblazoned with a Mega-M, because you’re now doing it in a way that doesn’t bamboozle them with your impenetrable development-jargon.

I’ve mentioned source-control that supports experimentation. Another thing that I lean on heavily is my test-suite.

Let me be blatantly honest with you here. I doubt that extreme test-driven development approaches that attempt to close in on 100% code coverage are a good match for this kind of work. A large suite of test cases does indeed calcify a code-base, making it a real effort to radically change an approach if you also have to revisit all the related tests.

As a consequence, I’m not hung up on extensive unit-testing in this kind of role. Stuff that needs to always work, which is typically foundational, and unlikely to change anyway, gets unit-test love. I haven’t, however, bothered climbing the very high up the stack with unit-tests that rely on mocks.

I pay far more attention to integration testing “key methods” through my reuse of NUnit. I allow the method of interest to call down into other real methods, limiting myself to only mock data driving “live” code. The tests then interrogate how well that key method is playing ball with it’s neighbours.

These integration tests are not the Unit Tests you are looking for!

These integration tests are not the Unit Tests you are looking for!

As these algorithms are attempting to simulate nature, there’s a trend to inject a degree of randomness into the simulations, which makes things more difficult to test. Practice looking for loop and method invariants, and testing on them. The number of times I’ve saved myself from a very-bad-ending through an integration test suddenly complaining that an invariant has just been violated is now too large to count.

Don’t be afraid to ask for budget to be allocated to tool support. For the most part, I favour freeware over commercial competitors where I can, but sometimes you only get what you need with cash (this current project sees me spending time with commercial products EnterpriseArchitect, and the RedGate Profiler suite).

Channel your Inner Scribe

Meticulous notes through the course of the project are an absolute necessity for me in complex domains. Sometimes, sharing them matters, because I acknowledge that I am not even remotely a domain expert for the systems being simulated. What I’m advocating here is a project log that you can easily share with your collaborators. It’s got to be relatively free-form, allowing you to attach pictures, photos, videos, etc.

I humbly submit to you that circa 2013, a private WordPress blog, used as a daily journal, and shared only with your collaborators hits the sweet-spot in documenting and sharing your learning with collaborators who aren’t necessarily tech-savvy.

Don’t be afraid to take a photo of the scribble on the whiteboard and write your own notes on what it all means. Don’t be afraid to have your domain expert read those notes and throw peanuts at them (or you). Actually, being afraid is perfectly fine. Allowing it to then stop you from producing reliable code to base research outcomes on is what you’re aiming to push past.

Finally, if you don’t like the idea that you’ll be spending a goodly amount of time writing/coding, then staring at the writing/coding, then your naval, then the writing/coding again, then back at your naval until the magic “ahah” moment drops, this may not be the software career for you.

Sell what you’ve Done, because nobody else will

You probably won’t be sitting beside other software developers, and suddenly launching into a full-on nerd-fest on that clever sub-linear loop you just concocted after your 4-hours-passing-in-a-moment with the digital-fairies. Neither will you have a sales-team handy to decode your technobabble.

Rattling on about O-notation, normalisation, refactoring and the fundamental limits of concurrency thanks to Amdahl’s law is not a way to win non-software friends and influence neuro-typical mindsets.

But… but… if you don’t point out your wins in a language that your audience gets, they’ll never know you had those wins. Drop that O-notation wall of jargon and say instead “Yes, that thing that took a minute to run now takes 5 seconds.” Non-programmers understand concrete time measurements, and will sing praises to the code-set when it’s set in that framework.

Draw your UML diagrams, but don’t get all teary when they don’t understand that little crows-foot means a 1-many relationship, and matching keys to navigate the relationship. They’ll appreciate that certain blobs in your bubble-mania are named things they recognise, and sometimes, might even comment that they got something out of seeing this visualisation of the code-base.

Sell only the bits in the diagram they recognise and only to the degree that doesn’t make their eyes glaze over. The rest of your bubble-mania is for your own naval-gazing. Do, however, expect them to re-use your bubble-mania with audiences who also have no idea what the crows-foot means. Don’t let this disconnect bother you.

Do find ways to help those around you visualise what you do all day. It’s a absolute crying shame when the CEO of a software company is so misguided on software development that they publicly go on record, calling their software developers “glorified administration staff” . Don’t laugh. It’s happened.

Help the researchers you’re working with to understand that this is more than “just typing” by handing them artifacts they might understand… You might try a pretty animation of how you changed the software source-code over time. At the very least, they might ask for a copy for the next dance-rave they’re hosting.

So there you have it. My advice on being a research software engineer condensed into five bullet-points:

  • Expect greater degrees of freedom and fear
  • Lean into the complexity; fear doesn’t banish it
  • It’s agile, and test-driven, and telling doesn’t help
  • Channel your inner scribe
  • Sell what you’ve done because nobody else will

Good luck! May your ground-shaking “ahah” moments be frequent and mind-blowing!