Multi-char terminators
Colin Fleming
colin.flem... at coreproc.com
Thu Oct 5 19:58:05 UTC 2006
Hi all,
As part of parsing XML, I have the following rules for CData sections:
CDStart = '<![CDATA[';
CDEnd = ']]>';
CData = (Char* -- CDEnd) $each_char;
CDSect = CDStart CData CDEnd;
where each_char is a simple action that stores fc in a buffer. The
problem is that the last two characters in the buffer are always ]],
because the machine doesn't know until it encounters the > if it
should exit the CData machine. I work around this with the following:
CDSect = CDStart CData CDEnd %trim_content;
where trim_content strips the last two characters of the buffer, but
it's a bit ugly. It also wouldn't work if the terminator were some
variable-length production. Is there any general way to handle this
case?
Cheers,
Colin
More information about the ragel-users
mailing list