An Apology to ArcGIS Database Designers

Today, I reached a point where I was ready to try aligning changes in a spreadsheet with an ESRI Shapefile, so users could see which wetlands derive benefit from simulated water inundations, and now I’m very sorry.

I’ve had an issue with one of the key output worksheets since I first began this project. In fact, I tore it out of earlier versions of the simulation, as is assaulted my delicate data encoding sensibilities. The offending worksheet was arcane, and encoded a very sparsely populated matrix of data with a static number of rows, and variable number of columns. This also makes it it most inconvenient to treat like a database table, whose rows can vary but whose columns remain static.

A worksheet, most arcane!

A worksheet, most arcane!

Look above you reader, and feast your eyes on the worksheet of “I really don’t give a flying flock about your ability to understand me”. By staring at the code that produces it, I see that there’s a data value of 1 within a given cell of this worksheet, it’s to be interpreted as “For the Planning Unit in the far left-most column of the same row, there was a water inundation event for the run and season encoded in the column header for that cell”.

In my hubris and naivety, I tore it out in a thrilling moment of good-riddance to bad-data, and promised I’d do better when the time came.

The humble pie… it tastes like sawdust and tears on a fresh bed of denial.

It all began when I started reading the relevant GUI code, and realised that it was deliberately issuing SQL statements to the key ArcGIS dbf file to tear the dbf file to shreds, and rebuild it in nearly exactly the same image as that worksheet.

The arcanery stretches beyond Excel!

The arcanery stretches beyond Excel!

Now, this one is new to me.  I’ve never seen code that tears a database definition up, and rebuilds it to allow it to dynamically alter its columns on every single run of a simulation. In fact, before today I’d have thought someone who’d done this kind of thing needs to maybe get out of the basement and eat a good vindaloo with their favourite super-hero, followed by a nice long lie-down with a beloved stuffy.

Mordox loves his stuffy

Mordox loves his stuffy

I got suspicious, because after a week of messing with his code, I trust the guy who built the GUI to not build arcane for its own sake. I’ve vaguely known for a while that an ESRI shape file with geometry information needs to ship with an accompanying database file, holding extra attributes about those shapes. Today, I was forced to learn more.

I started asking questions of the ArcGIS users in the office, then of the Web around my suspicions. Specifically, I’d gotten worried that even though I could treat that database file like a pretty regular relational database, there was a ‘done way’ of using these files involving rows to perfectly align with ESRI planning units (geometric shapes).

The web-trawl delivered gold, and I’m telling you because I couldn’t find someone to effortlessly and quickly tell me.  In fact, I’m just going to quote ESRI directly (angle bracket text added to keep it straight in my head):

      There is a one-to-one relationship between geometry [shp] and attributes [dbf], which is based on record number. Attribute records in the dBASE [dbf] file must be in the same order as records in the main [shp] file.

A moment of stillness.. then… the downward lurch of assumptions being kicked out from under me faster than The Flash suffering for a guilty night of overindulgence with his favourite vindaloo.

Denial kicked me like a stuffy-hating mule who’s just found itself in a party full of furries. I went into bubble-mania mode to be absolutely certain I knew what I was dealing with at the domain level before I’m forced to reconsider how I encode it:

A simplified domain model

A simplified domain model

Why was it denial (you probably didn’t ask)?  I didn’t need to draw that diagram.  I know full well that there’s a many:many relationship between seasons and wetlands. If I’m being forced to store a many:many relationship in a format where one of the foreign keys is also necessarily the primary-unique, that removes my option to mess with the rows, leaving the only vector I can move along being the columns. Just… like.. it… is… now.

So… my heartfelt apologies ArcGIS database designers. I see now that here are times where you just can’t help getting all voodoo with the data. I take back all the ugly thoughts I had earlier this morning.

I need a curry and a cuddle.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s