Asynch Access for Performance
I recently spent too much time chasing a very frustrating bug where every 2nd attempt to write to a database was being silently lost. Through a trial-by-fire, I finally convinced myself that the I was seeing an asynchronous-write to the database and/or a dirty cache read from the same rows. The key moment here was my realisation that so long as I left the screen alone long enough, the database would eventually contain the save data from the last “save” button press, but that the screen would trigger a table refresh much sooner, capturing the database values as they were before that last save.
I found a chunk of code that was basically one GUI event handler triggering another. The first did a database write of about 14000 records to a Microsoft Access database. The second read that same set of records back out of the database. Once I pulled the event handlers apart it started singing again, so I went hunting to see if I could find evidence to support what I was seeing.
I finally found gold, straight out of a Microsoft knowlege-base article. The key phrase for me was:
Microsoft Jet has a read-cache that is updated every PageTimeout milliseconds (default is 5000ms = 5 seconds). It also has a lazy-write mechanism that operates on a separate thread to main processing and thus writes changes to disk asynchronously. These two mechanisms help boost performance, but in certain situations that require high concurrency, they may create problems.
Regaining Accuracy in Access
This morning, after some discussion on the tradeoffs inherent in this feature of Access (or more accurately, the JET Database Engine), my supervisor has asked me to find a way to make it the database writes “not lazy”, and its reads avoid coming from a dirty cache.
I found this useful Microsoft article on synhronising JET OLE DB reads and writes, describing our situation pretty-much exactly. There was no discussion on the lazy write settings from the first article, so I’m currently assuming that the read cache is somehow clever enough to combine what’s on disk with that’s still to be written out of memory when we do a refresh.
Either way, in the software’s prototype state, we are relying on cross-checking the data by occasionally watching it fill/alter the database as the models run in these tight little manual calibration loops. The user doing the calibration is now very fast at this. They complete a “modify database, calibrate, repeat the loop” process in only a few seconds for simple calibration examples. We need to guarantee that they’re using what’s actually just been saved in their calibration run.
I tracked down the various properties for the JET 4.0 provider, and modified the two properties Jet OLEDB:User Commit Sync and Jet OLEDB:Implicit Commit Sync, setting them to ensure that changes are written synchronously.
Initially, I just tried setting them within the connection string through to the database, but no matter what I tried, I always got an error dialog back stating “Cannot find installable ISAM”. I trawled the web and learned that unless the connection string were exactly right, I’d get this as a side-effect. Even then, no matter how hard I stared at the string, it looked the way I thought it should.
I eventually gave up (see this Microsoft knowledge-base entry for why it didn’t work), and assigned the connection properties their required values immediately after opening the database connection using code like this:
connection.Properties("Jet OLEDB:User Commit Sync").Value = True connection.Properties("Jet OLEDB:Implicit Commit Sync").Value = True
On the Suitability of Access
When I first interviewed for the job, there was this moment when I found out that Access was being used as the Database engine. I mentioned that I’d heard it wasn’t up to the task of serious database applications. The supervisor responded with something like “I’ve never had a problem”. I’m now happily arguing that he always had a problem, but just wasn’t aware of it.
Unfortunately, just because it’s a problem, we’re not just going to be able to solve it nicely by swapping Access out for something more industrial as databases go. The most likely candidate is SQL Server, given the technology stack the application’s been built with.
On reflection though, there are some killer-app features about Access that in this research environment, make it a very compelling choice for our database engine. The features that I think make it far more compelling are:
- One database, one file.
- A very feature-rich user-interface, allowing non-SQL experts to manipulate and view large data sets with relative ease.
- Data munging is often as simple as a cut and paste operation between two databases.
I list these because ultimately, this role is about gluing environmental research models together within a software support platform. The softer, human, side of that is that we’re gluing together data from various researchers and government bodies responsible for managing environmental systems.
These people don’t necessarily know the first thing about the theory behind databases, what an SQL statement is, and why a primary key matters. But… they use Access anyway, because it turns out to be one of the best enabling technologies for collaborative data-munging and querying by non-IT professionals currently available. It also goes some way towards answering for me why Excel seems a strong runner-up for this kind of data sharing.
So, given what I’ve seen in this role, and despite my own desire to completely retire Access (I desire I think I might just share with Microsoft), I just can’t see this project stepping away from its Access dependency any time soon.