[ragel-users] How to convert [#x2070-#x218F] to Ragel grammar?
Iñaki Baz Castillo
ibc at aliax.net
Fri Nov 20 19:52:11 UTC 2009
El Viernes, 20 de Noviembre de 2009, Adrian Thurston escribió:
> Okay, in that case then have a look at the unicode2ragel.rb script in
> the contrib directory.
Hummm, two points:
1) I'm not generating Ruby code, but C code (perhaps it doesn't matter and you
suggested that script for other reason).
2) I don't see that script in 6.4 sources:
/usr/src/ragel-6.4$ ls -l contrib/
total 36
-rw-r--r-- 1 root root 8320 2009-07-08 02:03 Makefile
-rw-r--r-- 1 ibc ibc 34 2009-04-05 00:06 Makefile.am
-rw-r--r-- 1 ibc ibc 8280 2009-05-18 15:56 Makefile.in
-rw-r--r-- 1 ibc ibc 1386 2009-04-04 23:56 ragel.m4
-rw-r--r-- 1 ibc ibc 85 2009-04-04 23:56 ragel.make
> It can help with generating ragel defintions for
> you.
The fact is that I just wnat to create ragel grammar (using single byte) from
this original grammar:
NameStartChar ::= ":" | [A-Z] | "_" | [a-z] | [#xC0-#xD6] | [#xD8-#xF6] |
[#xF8-#x2FF] | [#x370-#x37D] | [#x37F-#x1FFF] |
[#x200C-#x200D] | [#x2070-#x218F] | [#x2C00-#x2FEF] |
[#x3001-#xD7FF] | [#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
[#x10000-#xEFFFF]
Must I do some "exotic" to achieve it? I've already played with UTF-8 code in
Ragel by using this grammar (from SIP protocol ABNF):
UTF8_CONT = 0x80..0xbf;
UTF8_NONASCII = ( 0xc0..0xdf UTF8_CONT ) | ( 0xe0..0xef UTF8_CONT{2} ) |
( 0xf0..0xf7 UTF8_CONT{3} ) | ( 0xf8..0xfb UTF8_CONT{4} ) |
( 0xfc..0xfd UTF8_CONT{5} );
Unfortunatelly I have no idea about if the former grammar ("NameStartChar")
is, or not, related to UTF-8 or not.
The fact is that I could simplify this grammar (after understanding what it
means) even if it allows some invalid bytes.
However, please let me ask again:
What is #x2FF? is it "0x2F 0xF0"? or "0x02 0xFF"? I need to know it in order
to do a workaround.
Really thanks a lot.
--
Iñaki Baz Castillo <ibc at aliax.net>
More information about the ragel-users
mailing list