Problem with a scanner dropping the first character of an identifier.
Patrick O'Grady
patr... at baymotion.com
Tue Mar 20 22:22:58 UTC 2007
Hi, all--
I've been struggling with a little self-test fixture which uses Ragel to
scan some input. Here's the test program:
#include <stdio.h>
%%{
machine scanner ;
ids := |*
identifier = [a-zA-Z_][a-zA-Z0-9_]* ;
identifier
=> { printf("Got identifier: %.*s.\n", tokend - tokstart,
tokstart);
fret ;
}
;
(' '|'\n'|'\r')*
=> { fret; }
;
any
=> { printf("Ignored.\n"); fret; }
;
*| ;
main := ( any %{ fhold; fcall ids; } )* ;
}%%
int main()
{
unsigned cs ;
char const * p ;
char const * pe ;
char const * tokstart ;
char const * tokend ;
unsigned act ;
unsigned stack[100] ;
unsigned top ;
%%write data ;
%%write init ;
char const s[] = "Once upon a time." ;
p = s ;
pe = &(s[sizeof(s)]);
%%write exec ;
%% write eof ;
return 0 ;
}
I'm compling with Ragel 5.19/MSVC, and I get the following output.
Got identifier: nce.
Got identifier: upon.
Got identifier: a.
Got identifier: time.
Ignored.
Ignored.
Everything here is as expected, except the first identifier, which should be
"Once", not "nce"--it seems to have skipped over the first 'O'. First--is
there a better way to get a list of all the tokens in the input? Anyone
have any clues about this misbehavior? Thanks in advance.
-patrick
More information about the ragel-users
mailing list