Sunday, September 12, 2004

Down Time Project

I strained some lower back muscles on Thursday and since then I've been taking muscle relaxants, ibuprofen, sleeping a lot and generally taking it easy. I took advantage of the down time to start writing the alternate regular expression parser I recently blogged about. The syntax has changed a little but the spirit of the original idea are intact. The following are some samples of what it can generate.

Standard Regex:
^(([^<>()[\]\\.,;:@"]+(\.[^<>()[\]\\.,;:@"]+)*)|(".+"))
Verbose Regex:
define restricted ('<>()[]\\.,;:@"')
anchor begin
group(
group(
oneOrMore(notAny(restricted)) +
zeroOrMore('.' + oneOrMore(notAny(restricted))
)
)
or group('"' + oneOrMore(any) + '"')
)



Standard Regex:
^[a-zA-Z0-9](([_\.\-]?[a-zA-Z0-9]+)*)@([a-zA-Z0-9]+)
(([\.\-]?[a-zA-Z0-9]+)*)\.([a-zA-Z]{2,})$
Verbose Regex:
define ALPHANUM (range('a','z','A','Z','0','9'))
define ALPHA (range('a','z','A','Z'))
anchor begin
ALPHANUM +
group(zeroOrMore(zeroOrOne('_.-') + oneOrMore(ALPHANUM)))
+ '@' +
group(oneOrMore(ALPHANUM)) +
group(zeroOrMore(zeroOrOne(any('.-')) + oneOrMore(ALPHANUM))) +
'.' +
group(repeat(ALPHA,2, ))
end anchor


The expression (Standard and Verbose) in each box are identical. The result of running the verbose syntax parser is the standard expression.



I haven’t added the named groups feature I talked about yet. I wanted to get the basic parser done first. I also need to expand my set of unit tests and flush out the capabilities. I’ve seen certain expressions that I cannot even read so it’s quite difficult to know how to parse them. If nothing else this project is honing my regular expression skills.

Post a Comment
 
The Out Campaign: Scarlet Letter of Atheism