Tag Archives: MS Access

The Database Design Goggles! They Do Nothing!

Today, I started waiting for feedback on a dry, dry, dry technical document I just sent on for the client to do peanut target-practice with. To entertain myself while I wait, I popped open a database sent to me as part of the same project for determining how well I could integrate it into other packages.

Did I get my entertainment? Yes, I certainly did, though in a way reminiscent of how I was entertained by “Hedwig and the Angry Inch“. The entire time I was doing it, I watched with interest as a part of me recoiled in horror and another other delighted in the sheer creative chaos that abounded.

Now… this database isn’t mine, and I’m not sure the creators would appreciate what I’m about to do here… so, I’ma gonna just blur all incriminating detail that could possibly help identify them. Now… dear reader, if by some microscopically small chance, you ARE the creator of this abomination…


Also.. thankyou… today has been a blast from a problem-solver perspective. You’re keeping me in a job, where I can confidently claim professional skills.

So, I started out where I always start out when introducing myself to a new database. “Show me da structure!” I asked, and got something that Microsoft Access renders thusly as an ER diagram:

The original ER Diagram for Database

The original ER Diagram for Database

Pictures are pretty, and hide behind abstraction the full horror. Let me help you get where I got:

  • Rampant redundancy abounds in the form of poorly considered primary keys that allow duplicate rows.
  • There was absolutely zero database-enforced referential integrity found between the tables. Reconstructing relationships by hand by manually comparing column data between tables did ensue.
  • Spurious SQL queries, and tables with junk data abounded. Three tables were obviously junk, and one isn’t related to any of the others in any way shape or form whatsoever.
  • Fantastic Access VB-Script code told me what to expect when on inspection of the first form, I saw the same user-error message misspell the word “empty” as “emty” over, and over, and over again. DRY violations abound!

So anyway, I needed to understand database structure. I jumped into the laborious task of introducing referential integrity. Sadly, I could only go so far. Here’s a pretty ER diagram of the same database, cleaned up to the point where I at least now get how it all hangs together:

ER Diagram of the Database with Referential Integrity Constraints

ER Diagram of the Database with Referential Integrity Constraints

Database Design peeps will notice that some of those relationships don’t have arity specified. I can guarantee you that in every single case, there’s a bunch of crap data in the foreign-key table that is close, but not precisely a foreign key to the primary key of another table.

Also, I noticed that for three of those satellite tables around the monster one in the middle, there was redundancy in the form of a column in the main table that munges up what’s evidently more accurate in the satellite tables.

What does that all mean? Any meaningful query you’re likely to want to do on this database will return crap. Yes peeps, we have full-on GIGO potential so huge it’s dizzying. I could integrate this database into another package, but without a serious scrub-down of the data, there’s not much point.

I’m SOOO done for today. To quote my daughter for the past few nights, “And that’s the end of the story! Goodnight!”

Trading Performance for Accuracy in MS Access

Asynch Access for Performance

I recently spent too much time chasing a very frustrating bug where every 2nd attempt to write to a database was being silently  lost. Through a trial-by-fire, I finally convinced myself that the I was seeing an asynchronous-write to the database and/or a dirty cache read from the same rows. The key moment here was my realisation that so long as I left the screen alone long enough, the database would eventually contain the save data from the last “save” button press, but that the screen would trigger a table refresh much sooner, capturing the database values as they were before that last save.

I found a chunk of code that was basically one GUI event handler triggering another.  The first did a database write of about 14000 records to a Microsoft Access database. The second  read that same set of records back out of the database. Once I pulled the event handlers apart it started singing again, so I went hunting to see if I could find evidence to support what I was seeing.

I finally found gold, straight out of a Microsoft knowlege-base article. The key phrase for me was:

Microsoft Jet has a read-cache that is updated every PageTimeout milliseconds (default is 5000ms = 5 seconds). It also has a lazy-write mechanism that operates on a separate thread to main processing and thus writes changes to disk asynchronously. These two mechanisms help boost performance, but in certain situations that require high concurrency, they may create problems.

Regaining Accuracy in Access

This morning, after some discussion on the tradeoffs inherent in this feature of Access (or more accurately, the JET Database Engine), my supervisor has asked me to find a way to make it the database writes “not lazy”, and its reads avoid coming from a dirty cache.

I found this useful Microsoft article on synhronising JET OLE DB reads and writes, describing our situation pretty-much exactly.  There was no discussion on the lazy write settings from the first article, so I’m currently assuming that the read cache is somehow clever enough to combine what’s on disk with that’s still to be written out of memory when we do a refresh.

Either way, in the software’s prototype state, we are relying on cross-checking the data by occasionally watching it fill/alter the database as the models run in these tight little manual calibration loops.  The user doing the calibration is now very fast at this. They complete a “modify database, calibrate, repeat the loop” process  in only a few seconds for simple calibration examples.  We need to guarantee that they’re using what’s actually just been saved in their calibration run.

I tracked down the various properties for the JET 4.0 provider, and  modified the two properties Jet OLEDB:User Commit Sync and Jet OLEDB:Implicit Commit Sync, setting them to ensure that changes are written synchronously.

Initially, I just tried setting them within the connection string through to the database, but no matter what I tried, I always got an error dialog back stating “Cannot find installable ISAM”.  I trawled the web and learned that unless the connection string were exactly right, I’d get this as a side-effect. Even then, no matter how hard I stared at the string, it looked the way I thought it should.

I eventually gave up (see this Microsoft knowledge-base entry for why it didn’t work), and assigned the connection properties their required values immediately after opening the database connection using code like this:

   connection.Properties("Jet OLEDB:User Commit Sync").Value = True
   connection.Properties("Jet OLEDB:Implicit Commit Sync").Value = True

On the Suitability of Access 

When I first interviewed for the job, there was this moment when I found out that Access was being used as the Database engine. I mentioned that I’d heard it wasn’t up to the task of serious database applications. The supervisor responded with something like “I’ve never had a problem”. I’m now happily arguing that he always had a problem, but just wasn’t aware of it.

Unfortunately, just because it’s a problem, we’re not just going to be able to solve it nicely by swapping Access out for something more industrial as databases go.  The most likely candidate is SQL Server, given the technology stack the application’s been built with.

On reflection though, there are some killer-app features about Access that in this research environment, make it a very compelling choice for our database engine.  The features that I think make it far more compelling are:

  • One database, one file.
  • A very feature-rich user-interface, allowing non-SQL experts to manipulate and view large data sets with relative ease.
  • Data munging is often as simple as a cut and paste operation between two databases.

I list these because ultimately, this role is about gluing environmental research models together within a software support platform.  The softer, human, side of that is that we’re gluing together data from various researchers and government  bodies responsible for managing environmental systems.

These people don’t necessarily  know the first thing about the theory behind databases, what an SQL statement is, and why a primary key matters. But… they use Access anyway, because it turns out to be one of the best enabling technologies for collaborative data-munging and querying by non-IT professionals currently available. It also goes some way towards answering for me why Excel seems a strong runner-up for this kind of data sharing.

So, given what I’ve seen in this role, and despite my own desire to completely retire Access (I desire I think I might just share with Microsoft), I just can’t see this project stepping away from its Access dependency any time soon.

Sidestep Access File Sharing Lock Count Limitations

In my current role, I have co-workers who are required to make extensive use of Microsoft Access databases. These fellow co-workers are at a bit of a disadvantage though, in that they don’t know what it means to have a “relational” database. “Normalisation” is just a meaningless word to them. For the most part they get by, regardless, and only call me in when they hit oddities. Such as yesterday’s issue.

A co-worker was recently given an updated set of river flow data spanning decades. In their eventual rightful home, we’re talking several tens of thousands of records that needed updating. We needed to find a relatively error-free, manual way of doing it where she didn’t need to revert to a lengthy SQL statement to do it. We found a way, but Access choked on the number of records we were wanting to update. Specifically, we hit a warning box in Access that said “File Sharing lock count exceeded. Increase MaxLocksPerFile registry entry.”

Playing with Windows registry entries is a last-resort option for me, so we went looking for alternatives. Official Microsoft support on the issue lists two ways. One via the registry, another via Access as a temporary override. We tried this “override” option with a succession of larger and larger increases to this “maximum locks per file” limit, but we kept hitting the same error.

Eventually, whilst trawling though application options for Access, I found a “Open databases by using record-level locking” check-box.  I unchecked the box, and we found ourselves successfully copying the tens of thousands of rows of data from one table to the other.  The checkbox itself switches between coarse-grained “memory page” locks and fine-grained “record-level” locks, so what I’ve really done by tweaking this flag is to decrease the number of locks needed for an update, with the tradeoff that if some other process wants to update the data, it may be held up waiting for me to release a lock on a record that I’d finished (or never) updated.

At the time that I was doing this, I wasn’t aware that Access allowed “Exclusive” mode for opening a database. Theoretically, in exclusive mode, locks aren’t an issue so I’d probably have had success taking this approach as well.

Transferring a Microsoft Access Table definition between Databases

I’m currently in the process of taking a long-running experiment with code and sharpening it up into something that we can include in the code-base proper. An intermediate step involves converting data from one format to another, which is stored in an Access database. When I was experimenting, I took a copy of the entire old database and hacked with wild abandon until I had something that looked like we’d get the gold with.

The database definitions for the relevant tables have changed enough that I don’t care to manually redo the changes in the old database when they were just sitting there in the new, ready to go. So, now I have a dilemma where I need both old and new table definitions in the same database for a time as I do the data conversion.

I initially thought it might just be as easy as “copy” table in one access database, and a “paste” into another. No dice (shame, Access has surprised me in the past in it’s ease of use with data dictionary goofery).

I then thought I could export a few tables to a new intermediate Access database, and import them back into the old. No dice (the intermediate needs to pre-exist first it seems).

I THEN thought I might be able to export the tables to Excel, and then import them back into the old. That kinda worked, from a raw data point of view. All but the field names of the data dictionary (like whether they were keys, their data type, etc) got clobbered. *pant pant pant*

Finally, I tried an export/import cycle via XML. All the table’s definition came through quite nicely thanks. *phew*

A limitation I need to remember for the future is that the XML export option is limited to a table at a time (circa Access 2000). I had to recreate my relationships on the import end, but for the task at hand, there was only a single relationship and something I was happy to wear.