[ragel-users] confused about scanning
𝄆 Rob Harris 𝄇
rob.harris at gmail.com
Fri Aug 5 13:13:25 UTC 2011
All, help. I've R'd TFM all week trying to figure this out, but am still
confused (so please pardon the potential n00bness.)
I have to parse a config file for an app I'm working on, whose format is
basically of the format:
group MyGroup {
tcpclient( host: foo, port: 49152 );
udp( host: bar, port: 49152 ) > tcpserver( port: 11111 );
udp:foo:49152.nonblocking = true;
}
>From what I've read on the Intertubes, it seems that the SOP for processing
this is to define a main := which will match a particular line of the text
and then upon matching call a another machine to "scan" the message.
However, I'm not sure how to do that because it seem that regardless of
whether I define main as a matcher or a scanner, executing the parser always
seems to consume the text as it matches. For instance, when I parse the
group definition, I can simply match on the word "group" and then pass the
rest of the line (up to the {) in to the scanner and I can get 'MyGroup' out
relatively easily. However, when I try to parse the first encapsulated line,
I don't know whether I'm dealing with a string of the first line form or
third line form (or if the command is "chained" as in the second line) until
I've done a kleene star match of the entire line (up to the ;) at which
point it seems that the parser has already consumed the entire line and when
I pass it into a scanner the pointers are already at the next line. Do I
need the store the starting pointer before the first main scan (and if so,
how?) and then how would I tell the downstream scanner where to start? I
thought of making a number of nested c++ "parser objects" but that just seem
inherently wrong.
Below is what I've written so far--just enough to hopefully pass the first
two cases. Again, I don't know if I'm only a character or so off or if my
mindset is completely off. Any help would be appreciated.
--
Rob Harris
Technological Pragmatist
rob period harris shift-2 gmail decimal-point com
"The universe tends towards maximum irony." --Jamie Zawinsky
%%{
machine sas_scanner;
ml_comment = '/*' ( any )* :>> '*/';
sl_comment = '//' [^\n]* '\n';
comment = ml_comment | sl_comment;
wspace = comment | space+ ;
integer = [0-9]*;
float = [0-9]* '.' [0-9]*;
identifier = [a-zA-Z][a-zA-Z0-9]*;
fqsm = [a-zA-Z] ( [a-zA-Z0-9:][a-zA-Z0-9_] )*;
sqstring = '\'' [^\n]* :>> '\'';
dqstring = '\"' [^\n]* :>> '\"';
strvalue = ( integer | float | identifier | sqstring | dqstring );
action DEBUG { fprintf( stderr, "state: %4d, char: %c\n", cs, *p ); }
action RESET { reset(); }
action CRLF { std::cout << std::endl << std::endl; }
action NAME { m_name.append( 1, fc ); }
action KEY { m_key.append( 1, fc ); }
action VAL { m_val.append( 1, fc ); }
action QKV
{
printf( "[%s]=>[%s]\n", m_key.c_str(), m_val.c_str());
m_kvMap[ m_key ] = m_val;
m_key.clear();
m_val.clear();
}
action SNAME { printf( "NAME: [%s]\n", m_name.c_str() ); }
kvpair = ( identifier space* ':' space* strvalue );
kvlist = ( space+ | kvpair | ',' space+ kvpair );
instantiation = ( identifier '(' kvlist* ')' );
instantiation_chain = (
instantiation $NAME ( space* '>' space* instantiation )*
) $NAME >RESET ';' @SNAME;
inst_chain_scanner :=
|*
space+;
identifier => { diff(); };
strvalue => { diff(); };
*|;
group_name = ( 'g' 'r' 'o' 'u' 'p' );
group_id = ( identifier - group_name ) @NAME;
group_line = ( group_name space+ group_id :>> space* '{' );
group_scanner :=
|*
space+ => { m_name.clear(); };
group_name;
group_id => { printf( ">> %s\n", m_name.c_str() ); };
'{' => { fret; };
*|;
main :=
|*
wspace+;
group_name => { fcall group_scanner; };
instantiation_chain => { fcall inst_chain_scanner; };
*|;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20110805/c354b607/attachment-0001.html>
-------------- next part --------------
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users
More information about the ragel-users
mailing list