[ragel-users] confused about scanning
Kevin T. Ryan
kevin.t.ryan at gmail.com
Fri Aug 12 02:45:12 UTC 2011
I took what I think you were trying to accomplish (at least to some
degree) and tried to develop a state machine based on the
specifications as I understood them. I did this partially to help me
understand Ragel a little bit better, so I hope you don't mind that I
didn't use much of what you provided in your original email. Some
notes related to your email though:
- I don't think you want a scanner for what you are trying to
accomplish. A scanner spits out things like "number" or "string" or
"operator" without regards to how those things are put together. I
think you want something that understands the structure of what can
happen where (e.g., udp:foo = true "sets" udp:foo equal to true ... a
scanner might kick out "identifer" [for udp:foo], "operator" or
"equals" for '=' and "keyword" or "identifier" for 'true'.).
- I think you might be able to accomplish some of what you were
intending (even with a scanner) by using fgoto instead of fcall
(although I'm not entirely sure as I didn't fully grasp your code).
The following is in C (not C++), but I think should be easy to follow.
Note that the last section of my 'main' (checking for errors) was
very helpful in letting me know when I screwed something up (e.g.,
forgot a specific char in a machine, etc.). All I'm doing is printing
stuff out, but you could adapt it to your needs. Hope this helps (PS
- you may want to read
http://zedshaw.com/essays/ragel_state_charts.html - I also found it
very helpful in getting started with Ragel):
#include <stdio.h>
#include <string.h>
%%{
machine sas_scanner;
action init {
printf("group: ");
start = fpc+1;
}
action args {
printf("call: %.*s\n", fpc-start, start);
start = fpc+1;
}
action pr {
printf("%.*s\n", fpc-start, start);
start = fpc+1;
}
action kwd {
printf(" %.*s = ", fpc-start, start);
start = fpc+1;
}
action nl {
printf("\n");
start = fpc+1;
}
action reset { start = fpc; }
action chain {
printf("- Chained call -\n");
start = fpc+1;
}
action prset {
printf(" Set: %.*s ->", fpc-start, start);
start = fpc+1;
}
main := (
start: (
"group " @init -> group_name
),
group_name: (
alpha+ -> group_name |
" " @pr -> group_name |
"{" -> details
),
details: (
'(' @args -> arguments |
[:.] -> details |
'>' @chain -> details |
alpha+ -> details |
'\n' @reset -> details |
';' @reset -> details |
'=' @prset -> set |
' ' -> details |
digit+ -> details |
'}' -> final
),
arguments: (
',' @pr -> arguments |
alpha+ -> arguments |
':' @kwd -> arguments |
' ' -> arguments |
digit+ -> arguments |
')' @pr -> details
),
set: (
alpha+ -> set |
' ' -> set |
';' @pr -> details
)
);
}%%
%% write data;
int main () {
char* to_parse =
"group MyGroup {\n"
" tcpclient( host: foo, port: 49152 );\n"
" udp( host: bar, port: 49152 ) > tcpserver( port: 11111 );\n"
" udp:foo:49152.nonblocking = true;\n"
"}";
int cs, act;
const char* p = to_parse;
const char* pe = to_parse + strlen(to_parse);
const char* start;
const char* end;
%% write init;
%% write exec;
if (cs == sas_scanner_error) {
printf("Error parsing @ %s\n", p);
}
return 0;
}
---------------------
Kevin T. Ryan
http://blog.gridmule.com/
On Fri, Aug 5, 2011 at 9:13 AM, 𝄆 Rob Harris 𝄇 <rob.harris at gmail.com> wrote:
>
> All, help. I've R'd TFM all week trying to figure this out, but am still
> confused (so please pardon the potential n00bness.)
>
> I have to parse a config file for an app I'm working on, whose format is
> basically of the format:
> group MyGroup {
> tcpclient( host: foo, port: 49152 );
> udp( host: bar, port: 49152 ) > tcpserver( port: 11111 );
> udp:foo:49152.nonblocking = true;
> }
>
> From what I've read on the Intertubes, it seems that the SOP for processing
> this is to define a main := which will match a particular line of the text
> and then upon matching call a another machine to "scan" the message.
> However, I'm not sure how to do that because it seem that regardless of
> whether I define main as a matcher or a scanner, executing the parser always
> seems to consume the text as it matches. For instance, when I parse the
> group definition, I can simply match on the word "group" and then pass the
> rest of the line (up to the {) in to the scanner and I can get 'MyGroup' out
> relatively easily. However, when I try to parse the first encapsulated line,
> I don't know whether I'm dealing with a string of the first line form or
> third line form (or if the command is "chained" as in the second line) until
> I've done a kleene star match of the entire line (up to the ;) at which
> point it seems that the parser has already consumed the entire line and when
> I pass it into a scanner the pointers are already at the next line. Do I
> need the store the starting pointer before the first main scan (and if so,
> how?) and then how would I tell the downstream scanner where to start? I
> thought of making a number of nested c++ "parser objects" but that just seem
> inherently wrong.
>
> Below is what I've written so far--just enough to hopefully pass the first
> two cases. Again, I don't know if I'm only a character or so off or if my
> mindset is completely off. Any help would be appreciated.
>
> --
> Rob Harris
> Technological Pragmatist
> rob period harris shift-2 gmail decimal-point com
> "The universe tends towards maximum irony." --Jamie Zawinsky
>
> %%{
> machine sas_scanner;
> ml_comment = '/*' ( any )* :>> '*/';
> sl_comment = '//' [^\n]* '\n';
> comment = ml_comment | sl_comment;
> wspace = comment | space+ ;
> integer = [0-9]*;
> float = [0-9]* '.' [0-9]*;
> identifier = [a-zA-Z][a-zA-Z0-9]*;
> fqsm = [a-zA-Z] ( [a-zA-Z0-9:][a-zA-Z0-9_] )*;
> sqstring = '\'' [^\n]* :>> '\'';
> dqstring = '\"' [^\n]* :>> '\"';
> strvalue = ( integer | float | identifier | sqstring | dqstring );
> action DEBUG { fprintf( stderr, "state: %4d, char: %c\n", cs, *p ); }
> action RESET { reset(); }
> action CRLF { std::cout << std::endl << std::endl; }
> action NAME { m_name.append( 1, fc ); }
> action KEY { m_key.append( 1, fc ); }
> action VAL { m_val.append( 1, fc ); }
> action QKV
> {
> printf( "[%s]=>[%s]\n", m_key.c_str(), m_val.c_str());
> m_kvMap[ m_key ] = m_val;
> m_key.clear();
> m_val.clear();
> }
> action SNAME { printf( "NAME: [%s]\n", m_name.c_str() ); }
> kvpair = ( identifier space* ':' space* strvalue );
> kvlist = ( space+ | kvpair | ',' space+ kvpair );
> instantiation = ( identifier '(' kvlist* ')' );
>
> instantiation_chain = (
> instantiation $NAME ( space* '>' space* instantiation )*
> ) $NAME >RESET ';' @SNAME;
>
> inst_chain_scanner :=
> |*
> space+;
> identifier => { diff(); };
> strvalue => { diff(); };
> *|;
>
> group_name = ( 'g' 'r' 'o' 'u' 'p' );
> group_id = ( identifier - group_name ) @NAME;
> group_line = ( group_name space+ group_id :>> space* '{' );
>
> group_scanner :=
> |*
> space+ => { m_name.clear(); };
> group_name;
> group_id => { printf( ">> %s\n", m_name.c_str() ); };
> '{' => { fret; };
> *|;
>
> main :=
> |*
> wspace+;
> group_name => { fcall group_scanner; };
> instantiation_chain => { fcall inst_chain_scanner; };
> *|;
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users
>
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users
More information about the ragel-users
mailing list