[ragel-users] short strings, including some which are 1-letter prefixes of other
Andrew Dalke
dalke at dalkescientific.com
Mon Dec 7 17:01:13 UTC 2009
Hi all,
I'm updating a parser I wrote a couple of years ago, which parses a molecular format called SMILES. Molecules contain atoms and bonds. The atoms include the element name as an abbreviation.
Consider C and Cl as two such abbreviations. One is a prefix of the other. I had
is_raw_atom = (
#
'B' % raw_atom_B_5_action |
'C' % raw_atom_C_6_action |
'Cl' % raw_atom_Cl_17_action |
...
and that worked for what I was doing before, but now I'm trying to get error handling to work. Suppose someone does "CQ". I want raw_atom_C_6_action to occur and then an error.
Ragel doesn't do that. It reports the error at the 'C', because it never transitions out from the end state.
What I did in my current update (in addition to changing the action names) is this:
aliphatic_organic = (
'B' %is_aliphatic_B %err(is_aliphatic_B) |
'C' %is_aliphatic_C %err(is_aliphatic_C) |
'N' >is_aliphatic_N |
...
'Cl' %is_aliphatic_Cl %err(is_aliphatic_Cl) |
'Br' @is_aliphatic_Br |
...
);
It works, but is it correct and proper? I did see there was the |* ... *| construct designed for things like this, but I didn't want the backtracking.
Best regards,
Andrew
dalke at dalkescientific.com
More information about the ragel-users
mailing list