'string' ranges
Paul
r.lp... at gmail.com
Fri Apr 6 07:25:10 UTC 2007
Hello all,
I'm wanting my ragel state machine to process unicode text encoded as utf-8.
There are some unicode ranges that I want to transition on e.g.
range = [0x0ED0-0x0ED9];
but I don't know how to express this in a minimal way with an unsigned char
alphabet (i.e. I don't think it can be done directly in ragel's expression
syntax).
My brain isn't in the best condition, but the two approaches I have thought
of:
1.) use a script to write out the set of strings in the range and leave it to
ragel to minimise the states (or something like this)
2.) use ragel's semantic conditions somehow.. (assemble utf-32 version and use
integer comparison)
But before I attempt either, has anyone had to do anything similar? Or are
there any suggestions I could use?
Thanks, and have a good Easter
- Paul
More information about the ragel-users
mailing list