Tag Archives: GIT

On the Forgetting of Gource

Alright.  This is the very last time I figure out how to get gource producing visualisations of my development activities from scratch. Today’s post is me leaving notes on this for next time, so I can cut straight to the chase.

A couple of times now, I’ve had occasion to want to give the people I write software for some insight into what I’ve been doing. This project last completed was one such example. A lot of work was done under the hood to enable the code-base to have a replaceable user-interface, and possibly also spatial database. The user-interface had a few new features, but the lion’s-share of it ‘looks’ exactly the same as it was when I started.

I can guarantee you though, that the source looks absolutely nothing like how it started out. Most of my time was spent in taking a code-base “designed” to be a single-user desktop application, and turn it into something that would be relatively easy to make multi-user (it is, now) and optionally, web-enabled (one particular large data-set stops this from progressing just yet).

How then, do I convince the people paying me to tear up their code-base that even though the user-interface is the same, things are now radically different under the hood? Enter gource, and some discussion on how it visualises the git repository as the source-code evolves through the project.

Continue reading


Git Recovery with git-fsck: A Too-Short Tale

Beloved Reader.

Forgive me for I have softwared.

I’m on the tail-end, or possibly tale-end, of a project that was pretty rough as such things go. Not the toughest gig I’ve done, but no cake-walk either.

Anyone who’s professionally played in this space knows that Murphy’s law is drawn to tight-deadline software development projects like a wunch of salivating, button-eyed bankers racing to the reading of the last will and testament of Marley & Marley. Continue reading

MWCIT Source Code Released Under BSD-3 License

It’s been a ‘swimming through molasses’ project all things considered, but today marks a milestone where I can finally drop down a gear. After clearing things with my employer, I’ve just placed the source-code to Release 1.0 of the ‘Murrumbidgee Wetlands Condition Indicator Tool’ under a BSD-3 license, and hosted it with GitHub.

The BSD-3 license was suggested by the University when I was sniffing around for options on how the project client could retain access to the source if they need it once my gig here is up. Turns out there’s excellent reasons for BSD-3 (or more accurately, NOT a Creative Commons license which was my first choice), so I was more than happy to settle on the suggestion.

Now, before you get all excited, Release 1.0 of the MWCIT isn’t much more than a button launcher driven by config-files:

Relase 1.0 of the MWCIT,  doing what it mostly does.

Release 1.0 of the MWCIT, doing what it mostly does.

Still, if you’re a developer with a passing interest in a simple (but not trivial) example of a home-grown Model-View-Presenter (MVP) implementation, or you’re interested in how to coax NSubstitute into firing events out of a mock object, there might be something in it for you.

Continue reading

Advice On Being a Research Software Developer

My current software development contract draws to a close. It’s the second in a row of appointments that involve writing software for different researchers, and the third such contract in my career. Allow me a moment, dear reader, to reflect on the experience, and attempt to describe what it’s like being a research software developer.

Of course, we only ever truly know an experience by slipping on the boots and walking the path, but by viewing it through the filter of my previous corporate life, perhaps the contrast with more typical environments might be enough to give you, interested reader, some idea of what to expect.

Before getting into the meat of it though, I’ve blogged about this contract a few times before. Let’s take a brief recap.

This contract saw the boss asking for a simple deployment, prompting my search for a single .NET assembly to house all others in a release. It had me making serious attempts to test with NUnit and Excel spreadsheets in a way that wouldn’t drive me mad. It had me rejig my underlying framework to better support mashups in terms of reading and writing simulation data with Excel spreadsheets and CSV files.

The project saw me cry tears of pain thanks to a 64-bit Windows upgrade badly misbehaving with my reliance on 32-bit libraries, both in terms of the Visual Studio debugger, and then NUnit. There was lots more fun that you’ll probably never learn about because I never thought blog to about it as it happened.

From the above blog posts, I confess that these posts do little to draw out the experience of research programming. These issues are just as likely to pop up in a corporate or government software development experience as they are in a role devoted to writing software for researchers.

All that is, for one very small point. That the blog posts exist at all. Which turns out to be a very big point. Let punctuate the point with its very own heading.

Expect Greater Degrees of Freedom and Fear

You’ll be working with researchers. The wonderful (and scary) thing about research is that you don’t really know where you’re going until you get there. Sure, you’ve got some ideas on what might be out there, but often, when you start, it’s not even remotely clear how to get from here to there.

Note that the structure you’re used to in corporate and government jobs probably doesn’t exist if you’re contributing to a science project. Unless it’s engineering flavoured, a researcher won’t have a clue what you’re on about if you attempt a deep discussion on requirements analysis, development methodology, testing frameworks, or other programming jargon.

The typical constraints that attempt to ensure quality are gone. Congratulations, you’re free. Free enough to hang yourself with all that extra rope. If you don’t start adding some of the more appropriate hard-earned skills back into the mix, it’s your funeral, cowboy/girl! You’re still being hired to produce quality, only now, you get to choose what that means.

Because researchers invest their time trying to learn something new, they actively grok and support the idea of trying new things, even if it may go nowhere:

In real terms then, yes, they’ll be relying on your expertise in programming (and dare I say it… computer science). However, more importantly, they’re counting on your capacity to jump in and just try stuff without necessarily knowing where it’s going, or even what you’re doing. Let’s ground this concept with a few examples:

  • The programming language used to construct a simulation prototype was inherently inefficient. A more efficient language with similar syntax to the original was chosen, allowing the researcher to retain an understanding of what the re-implementation does. The change in implementation was taken in the hopes that we’d get a goodly speed-up (Excel VB Script -> VB.NET) .
    • A factor of 10 speed up (50 seconds down to 5 for a typical run) was far, far bigger than I’d hoped for, and satisfied the boss in terms of retaining an understanding of the code-base. Did I know initially? Not with much certainty. Just an educated guess there’d be “some” speedup based on what I understood of the technology space.
  • Dead ends, and changes of mind that “sound” small, but would kick the wind out of all the assumptions that were holding my code-base together happened pretty regularly. The worst for me was a need to let a simulation “run” stretch over more than one year. The entire code-base worked on the assumption that a run and a year were synonymous. Frequent backtracks to get a good answer to this were the order of the day.
    • Know that if you fall in love with the code above the end-goal, you’re going to have a very bad time of it.
    • Pick a source-code control technology that actively supports (or at the very least, minimises interference with ) your capacity to try something new, and blow it all away if it turns out to be a flop. For me, GIT is a gift from the coding-gods in terms of how easy it is to branch experiments off, merge the winners and torch the losers.
  • Simulating how species benefit from releases of water into a river connected to wetland rich in wildlife gets complex fast. The day it occurred to me that grokking simulated annealing is the very shallow end of the pond was a very sobering day indeed. The day my boss made it clear that I’d be expected to invent new approaches to the scarier-end of the code-base was very, very sobering.
    • For my computer-science friends, simulated annealing is a novel way of finding a “good enough” answer to the 0/1 knapsack problem without the exponential effort of a brute-force search for the best answer.
    • Don’t be afraid to engage in any and every activity you can think of to understand what’s there and what you can do with it. Re-implement it. Draw funky diagrams, be they mind-maps or UML. Doesn’t matter, so long as you’re continually leaning into the complexity (see later) until the penny finally drops.
  • Things occasionally get very sticky. I write down what I did to fix it in enough detail so if it ever happens again, I can redo it without re-investing the effort I spent initially.So long as I stay away the very hard boundary of discussing results before we have a publication, I’m free to do what I want with those notes. Hence the work blog entries.

Fear and Freedom, sitting in a tree. K-I-S-S-I-N-G! Yes, they’re into threesomes, but know that YOU are the optional party in any ménage à trois that eventuates. All I can offer you is some recycled wisdom:

Failing isn’t in the falling down, it’s in the staying down.

Lean Into the Complexity; Fear Doesn’t Banish It

If it’s really research, it’s cutting edge. If it’s in a domain we’ve been looking at for a while, it’s also guaranteed to be involved. There’s complexity here that you are unlikely to find in other software roles.

For me, that complexity is where my inner-critic starts up his magic “That’s it! THIS time you are going to choke!” chorus. It helps to enjoy this kind of fear, and recognise that this is you standing at the edge of what you’ve tested yourself against. If shouting “BOOGEY-WOOGEY” at the complexity-monster doesn’t see it bat even a single eye-stalk, how do we come to understand it?

Work with what you’ve got right now, and start yesterday. Research papers, prototypes, whatever you have on-hand. A key aim here is to build a vocabulary quickly on the research domain so you can start having meaningful exchanges with the expert(s) as soon as possible.

What I like doing right off the bat is pulling out the nearest mind-mapping tool (Freeplane is my current favourite) and going to town on whatever reading material I’ve been handed, pulling out what seems most important into a web of terms I can begin hanging my new knowledge on.

A Mind Map of the 50,000ft view of the project.

A Mind Map of the 50,000ft view of the project.

I also like starting a degree of simple refactoring work on the code. It’s a good excuse to get used to the code, and if it was cooked up by someone who doesn’t consider programming their primary passion, I can guarantee you’ll have a rich field of refactoring potential.

In my case, I had nearly all of the functionality sitting in a single method call, 657 lines long. In terms of “Bad Code Smells“, we’ve got ample examples of “Long Method”, “Duplicate Code”, “Dead Code”. Lots of simple refactoring wins just sitting there, waiting to be knocked off as you work on groking what’s been written.

Exactly what you do to lean into the complexity comes down to a matter of personal preference. Abstracting away from any particular activity, I’m really engaged in activities of remembering by “doing” something with the material. Just reading new, highly complex material doesn’t help my retaining of the knowledge.

It’s Agile, And Test-Driven, And Telling Doesn’t Help

You won’t be handed a pretty design document full of informative UML diagrams that allow you to chunk your understanding, or a cross-referenced requirements specification clearly identifying atomic, testable, unambiguous requirements. Nobody’s going to list acceptance criteria, that once implemented, guarantee that you’ve done the right thing.

I’m adamant now that when it comes to picking an appropriate development methodology for these kinds of projects, you’re facing something that needs to be very agile. But please, don’t take my word for it, read a fantastic discourse on the subject of choosing an appropriate development methodology for the nature of the work.

Also… good luck selling your researcher(s) on daily SCRUM meetings, the necessity of TDD, continuous integration, pair programming, or whatever else you’re standing on your religious pulpit about.

Instead, consider leaning into regular coffee catch-ups, where you can air issues before they become obstacles. Mention that you built some tests to make sure you didn’t break anything important with the latest experiment. Tell them you’ll be ready to hand them something that runs (not necessarily behaving though) whenever they ask for it. Ask to sit with them and walk through the code together when it looks appropriate.

EFlows Class Diagram

EFlows Class Diagram

Get the point? You’re still doing the entire agile “thing”, but you’ve just gone meta! Pull out your Supers-cape emblazoned with a Mega-M, because you’re now doing it in a way that doesn’t bamboozle them with your impenetrable development-jargon.

I’ve mentioned source-control that supports experimentation. Another thing that I lean on heavily is my test-suite.

Let me be blatantly honest with you here. I doubt that extreme test-driven development approaches that attempt to close in on 100% code coverage are a good match for this kind of work. A large suite of test cases does indeed calcify a code-base, making it a real effort to radically change an approach if you also have to revisit all the related tests.

As a consequence, I’m not hung up on extensive unit-testing in this kind of role. Stuff that needs to always work, which is typically foundational, and unlikely to change anyway, gets unit-test love. I haven’t, however, bothered climbing the very high up the stack with unit-tests that rely on mocks.

I pay far more attention to integration testing “key methods” through my reuse of NUnit. I allow the method of interest to call down into other real methods, limiting myself to only mock data driving “live” code. The tests then interrogate how well that key method is playing ball with it’s neighbours.

These integration tests are not the Unit Tests you are looking for!

These integration tests are not the Unit Tests you are looking for!

As these algorithms are attempting to simulate nature, there’s a trend to inject a degree of randomness into the simulations, which makes things more difficult to test. Practice looking for loop and method invariants, and testing on them. The number of times I’ve saved myself from a very-bad-ending through an integration test suddenly complaining that an invariant has just been violated is now too large to count.

Don’t be afraid to ask for budget to be allocated to tool support. For the most part, I favour freeware over commercial competitors where I can, but sometimes you only get what you need with cash (this current project sees me spending time with commercial products EnterpriseArchitect, and the RedGate Profiler suite).

Channel your Inner Scribe

Meticulous notes through the course of the project are an absolute necessity for me in complex domains. Sometimes, sharing them matters, because I acknowledge that I am not even remotely a domain expert for the systems being simulated. What I’m advocating here is a project log that you can easily share with your collaborators. It’s got to be relatively free-form, allowing you to attach pictures, photos, videos, etc.

I humbly submit to you that circa 2013, a private WordPress blog, used as a daily journal, and shared only with your collaborators hits the sweet-spot in documenting and sharing your learning with collaborators who aren’t necessarily tech-savvy.

Don’t be afraid to take a photo of the scribble on the whiteboard and write your own notes on what it all means. Don’t be afraid to have your domain expert read those notes and throw peanuts at them (or you). Actually, being afraid is perfectly fine. Allowing it to then stop you from producing reliable code to base research outcomes on is what you’re aiming to push past.

Finally, if you don’t like the idea that you’ll be spending a goodly amount of time writing/coding, then staring at the writing/coding, then your naval, then the writing/coding again, then back at your naval until the magic “ahah” moment drops, this may not be the software career for you.

Sell what you’ve Done, because nobody else will

You probably won’t be sitting beside other software developers, and suddenly launching into a full-on nerd-fest on that clever sub-linear loop you just concocted after your 4-hours-passing-in-a-moment with the digital-fairies. Neither will you have a sales-team handy to decode your technobabble.

Rattling on about O-notation, normalisation, refactoring and the fundamental limits of concurrency thanks to Amdahl’s law is not a way to win non-software friends and influence neuro-typical mindsets.

But… but… if you don’t point out your wins in a language that your audience gets, they’ll never know you had those wins. Drop that O-notation wall of jargon and say instead “Yes, that thing that took a minute to run now takes 5 seconds.” Non-programmers understand concrete time measurements, and will sing praises to the code-set when it’s set in that framework.

Draw your UML diagrams, but don’t get all teary when they don’t understand that little crows-foot means a 1-many relationship, and matching keys to navigate the relationship. They’ll appreciate that certain blobs in your bubble-mania are named things they recognise, and sometimes, might even comment that they got something out of seeing this visualisation of the code-base.

Sell only the bits in the diagram they recognise and only to the degree that doesn’t make their eyes glaze over. The rest of your bubble-mania is for your own naval-gazing. Do, however, expect them to re-use your bubble-mania with audiences who also have no idea what the crows-foot means. Don’t let this disconnect bother you.

Do find ways to help those around you visualise what you do all day. It’s a absolute crying shame when the CEO of a software company is so misguided on software development that they publicly go on record, calling their software developers “glorified administration staff” . Don’t laugh. It’s happened.

Help the researchers you’re working with to understand that this is more than “just typing” by handing them artifacts they might understand… You might try a pretty animation of how you changed the software source-code over time. At the very least, they might ask for a copy for the next dance-rave they’re hosting.

So there you have it. My advice on being a research software engineer condensed into five bullet-points:

  • Expect greater degrees of freedom and fear
  • Lean into the complexity; fear doesn’t banish it
  • It’s agile, and test-driven, and telling doesn’t help
  • Channel your inner scribe
  • Sell what you’ve done because nobody else will

Good luck! May your ground-shaking “ahah” moments be frequent and mind-blowing!

Torchlight 2 Toon Archiving: The Sequel

The kids left me alone last night, so I decided I’d goof around with my Torchlight  2 character archiving.  Recent reading around Git has convinced me that Git stores deltas of its binary file commits, so my last objection to using Git over Subversion for binary files has just been laid to rest.

Also, the original Perl script was a bit brain-dead in that it was good at automatically committing save files that had changed, but additions and deletions were still things I’d have to manually tell the repository about.

As these things tend to go, the lion’s share of the script was written in maybe the first 15 minutes.  The rest of the night was spent tweaking and testing the command template constants, destroying and re-creating Git archives until I had myself convinced that the script was automatically committing exact replicas of the save-game directory, regardless of additions, deletions and modifications.

The script turned out to be pretty short, and is included at the end of this blog post in all its perlesque glory.  Now that I have the trick of it though, the pattern is pretty much applicable to any save-game directory I might want to subject to version control.  The human-readable formula is basically:

  1. Delete all contents of the directory within your version control repository that contains the copied image of your save-game directory.
  2. Fully (recursively) copy the contents of that save-game directory into  the recently emptied archive directory.
  3. Ask the repository to identify all files that have been deleted from the  copied file set that it is currently archiving.  If there are any deleted files, stage these files to also be deleted in the repository on the next commit.
  4. Ask the repository to identify all new files that it is currently not archiving. If there are any new files, stage these  files to be added into the repository on the next commit.
  5. Have the repository apply in a single commit, all deletions, additions and modifications identified.  

# An archive utility for Torchlight 2 characters. Works by commiting the current
# save-game contents to a GIT repository housed in the parent directory to the script.
# (c) 2013, Lindsay Bradford, released under the Creative Commons Attribution licence.
# http://creativecommons.org/licenses/by/3.0/
# The parent directory has two subdirectories, being "bin", and "toons".
#  The "bin" directory contains this script.
#  The "toons" directory contains the save-game content of the game.
# Usage:
#    archiveTLToons.pl <"optional commit message string"">
# Modify the constants below to suit your own environment

package ArchiveTL2Toons;

use strict;

###### constants for directory locations and command templates below #####

use constant GAME_SAVE_DIR =>
	"/home/linds/.wine/drive_c/users/linds/My Documents/My Games/Runic Games/Torchlight 2/save/76561198044661040/.";

use constant ARCHIVE_TOON_DIR =>

	sprintf "rm -rf '%s'", ARCHIVE_TOON_DIR;

use constant COPY_COMMAND =>
	sprintf "cp -r '%s' '%s'", GAME_SAVE_DIR, ARCHIVE_TOON_DIR;

	sprintf "git ls-files --deleted \"%s\" | xargs -r git rm --quiet", ARCHIVE_TOON_DIR;

	sprintf "git add --all \"%s\"", ARCHIVE_TOON_DIR;

use constant COMMIT_ARCHIVE_COMMAND => "git commit --quiet --m \"%s\"";

##### Methods below #####

# Bootstrap method.


# Archives the current set of Torchlight 2 toons by
# deleting the archive contents, taking a recursive
# copy of the save directory back into the directory
# and commiting a snapshot of the copied content.

sub archiveTL2toons() {
  my $commandLineComment = $_[0];

  if ($commandLineComment eq "") {
  	$commandLineComment = "Commit of current save state.";

    "Clearing archive content...",

    "Copying Torchlight 2 save game content to archive...",


# Commits a snapshot of the current content
# of the archive, assuming that the current content
# is exactly what the commit should contain.  Specifically:
#   * Any files missing  from the archive are deleted in the commit
#   * Any new files found are automatically added with the commit
#   * All modified files are commited as-is.

sub snapshotArchive() {
 my $commandLineComment = $_[0];

    "Staging removal of missing files from archive...",

   "Staging addition of untracked new files to archive...",

  my $message = &getNowTimestamp . " | " . $commandLineComment;

  my $commitCommand = sprintf COMMIT_ARCHIVE_COMMAND, $message;

    "Commiting staged snapshot of save-directory to archive...",

# Generates a timestamp of the current time.

sub getNowTimestamp() {
 my ($sec, $min, $hr, $day, $mon, $year) = localtime;
 return sprintf("%04d-%02d-%02d %02d:%02d",
       1900 + $year, $mon + 1, $day, $hr, $min);

# Simple method that prints the $message supplied,
# runs the $command specified, and prints any results
# the command generates.

sub runCommand() {
  my ($message, $command) = @_;

  print "$message\n";

  my $result = `$command`;
  print $result;

On a final note, I installed EPIC for Eclipse to modify the script, so despite my intense dislike for the lack of automated refactoring, I’m begrudgingly having to admit that it’s working better for me than doing it in a text editor.

Tales of the TeenyTyper #3 – The ActionBinder

In building the TeenyTyper, I decided that I wanted  to avoid the usual window decorations (like a close button).  A special key combination of <CTRL-D> would be instead used to shut down the toy once we’d finished playing.

I’ve tried implementing  Swing’s KeyListener interface in the past, but I’ve experienced the occasional KeyListener event not firing when it should have.  Sure enough, my first attempts at using a KeyListener misbehaved as expected.  A quick trawl of StackOverflow, and I hit gold.   Seems that I’d be far better off using key bindings instead.

So, in essence, I’m binding a particular action to perform for a given Swing component every time the component catches a relevant keystroke event.  Below is the reusable library function I settled on to establish this relationship:

public static void bindKeyStrokeToAction(
                     JComponent component,
                     String keyStrokeLabel,
                     KeyStroke keyStroke,
                     AbstractAction actionToPerform) {

  InputMap inputMap = component.getInputMap(



I’m not a big fan of that keystroke label requirement in the mapping. I have no need for the string itself once the mapping has been established. The string really could be anything for all I care (at least for this application).  To hide this string detail, and make the interface even easier to use, I implemented a wrapper to auto-generate a unique string mapping via Java’s UUID functionality like so:

public static void bindKeyStrokeToAction(
                     JComponent component,
                     KeyStroke keyStroke,
                     AbstractAction actionToPerform) {


So, implementing an action that shuts the TeenyTyper down gracefully when <CTRL-D> is pressed on the editor pane is achieved with the following call:

            KeyEvent.CTRL_DOWN_MASK), // <CTRL-D>
        new AbstractAction() {
          private static final long serialVersionUID = 1L;

          public void actionPerformed(ActionEvent arg0) {

And that’s it. No more KeyListener event misfire blues for yours-truly.

Keybind your actions long and prosper.

Tales of the TeenyTyper #2 – The git CPAP trap!

I currently class myself as a novice in distributed git repositories. In my extant work role, I’m using git for source version control, but I’m operating as a git project user group of one,  as my supervisor refuses to use any form of source control other than simply copying the entire source folder.

Being rather pragmatic about it all, I kept our git setup as simple as the situation needed.  I’ve gotten by just fine with a single repository on my work machine with a branch per developer that I can occasionally merge and redistribute from.

One of the things I’m learning out of my TeenyTyper experience is how to use distributed git repositories for my project via GitHub.  My early experiments have been making a mess of the commit history (no, don’t look!).

The problems started not long after I took my clone of the TeenyTyper repository hosted with GitHub.  I couldn’t push my recent commit back to GitHub, despite the fact that all I’d read had  me believing that I should be able to.  I eventually worked around it by doing a merge of the remote repository into the local repository.  The merged version I could push successfully back to GitHub. This seemed to happened rather non-deterministically, and irked me because I shouldn’t actually need to merge these changes.

I went pouring through the git manual this morning, and now I see what I was doing to make all that mess. It makes sense now that I see the process, but as a novice I missed initially what I was doing to cause the issues in the first place.

The essence of my problem was that I was doing the following sequence of activities:

  1. (change some repository managed files locally)
  2. git add <changed files>
  3. git commit
  4. git push <remote-repository> <local-repository>
  5. (some more changes forgotten in the first commit)
  6. git commit –amend
  7. git push <remote-repository> <local-repository>

The error returned looks something like this:

To git@github.com:user/repo.git
! [rejected] branchname -> branchname (non-fast forward)
error: failed to push some refs to 'git@github.com:user/repo.git'

I see now that a “commit amend” does not modify an existing git commit object, it replaces that commit object with an entirely new one, based on the old. You can confirm this for yourself easily simply by looking at the commit hash before and after the amend.  The second push attempt fails  because there is no common base for the two repositories to base the push on, as that original commit was replaced (only) locally by the “commit amend”.

My take-away lesson from all this is to self-censor any attempts at amending commits that have been sent off already to a remote repository. The convenience of tweaking a commit with an amend is just not worth the effort that I need to go through to get the repositories back on the same page.

It’s a trap, and it’s triggered by the sequence of (C)ommit, (P)ush, (A)mend, (P)ush git commands.  I’m now wishing they’d called a “commit amend” a “commit replace” to better mesh with my intuition on the  meaning of amend.  Ah well, we live and learn.

Push git commits long and prosper!