I recently had a need to take Enumeration instances from within .NET and pretty them up for human consumption. The heart of the problem involved how to take CamelCase text and add spaces between each word-break denoted by a new upper-case letter.
I’m mostly following the Microsoft internal coding guidelines for my naming conventions. Enumerations should thus be mostly PascalCase/Upper Camel Case, but I’m not above just grabbing external libraries and gluing them into the utility, risking oddities that don’t match the guideline.
Given that I can’t predict ahead of time how well the enumeration sticks to a strict PascalCase naming scheme, I wanted a regular expression that would cater for a wider range of strings than ‘strict’ PascalCase. I learnt that a programmer can drive themselves crazy catering for a rich range of possible encodings, so I decided to draw the line at strict CamelCase along with PascalCase, ignoring non-word characters for the time being.
Now, all languages have their little quirks with how they implement regular expressions, and .NET is no exception. Thankfully, after a little digging around, I discovered a good launch-point out at StackExchange based on somebody wanting to do a very similar thing in PHP. Very little messing around was required with my favourite expression’s syntax, which is always a pleasant thing. The final expression settled on was:
Interpret the expression thusly:
Look for a pattern that forms a boundary between two characters for valid CamelCase. On the left-side, seek a lower-case character (a-z). On the right-hand side, seek an upper-case character (A-Z). On the left-side, do what’s called a zero-width positive look-behind assertion to identify the lowercase character without moving the pattern matcher along the string. On the right, do a zero-width positive lookahead assertion in order to identify the spot here the new upper-case character in the string without consuming it in a pattern match. The split is to be made so the upper-case character starts a new string.
This blog post is essentially me saying to myself “Ok.. I can see that it works… but WHY does it work?” and deciding to scare whoever else out there likes the occasional good Regular Expression brain-twist.
A chunk of VB.NET code that makes PascalCase/CamelCase text into something more easily consumable by a human is below. The regular expression is created ahead of time outside the function for runtime efficiency.
Imports System.Text.RegularExpressions Private CamelCaseRegex As New Regex("(?<=[a-z])(?=[A-Z])") Public Function CamelCaseToHumanReadableString( ByRef inputString As String) As String Return String.Join( " ", CamelCaseRegex.Split(inputString) ) End Function
CamelCase for your human consumers long and prosper!