[ragel-users] ragel hex codes on x86_64
_why
why at whytheluckystiff.net
Fri Jan 16 19:03:22 UTC 2009
Hello, Adrian and fellow accomplices.
I've been doing some UTF-8 scanning with Ragel, using hex codes (as
previously recommended on this list.) This works fine if I compile
the state machine on 32-bit architecture. I can then build on either
32-bit or 64-bit without trouble.
However, on x86_64 (Ubuntu 8.10,) multi-byte UTF-8 characters aren't
accepted by my expression. Again, this isn't too alarming since I'm
able to produce a working machine by generating everything using the
32-bit executable, but I figured you'd want to hear about this.
I'm attaching a simple test case. If you want to see the full thing
in context: <http://github.com/why/potion>.
With fond feelings in the extreme,
_why
-------------- next part --------------
#include <stdio.h>
#include <string.h>
%%{
machine utf8;
utf8 = 0x09 | 0x0a | 0x0d | (0x20..0x7e) |
(0xc2..0xdf) (0x80..0xbf) |
(0xe0..0xef 0x80..0xbf 0x80..0xbf) |
(0xf0..0xf4 0x80..0xbf 0x80..0xbf 0x80..0xbf);
main := |*
utf8 => { printf("TOKEN: %.*s\n", (int)(te - ts), ts); };
*|;
write data nofinal;
}%%
int main()
{
int cs, act;
char str[] = "naïve";
char *p = str, *pe = str + strlen(str);
char *ts, *te, *eof = 0;
%% write init;
%% write exec;
return 0;
}
More information about the ragel-users
mailing list