Ragel pointers

Mon Aug 25 23:03:43 UTC 2008

So, I've been struggling on and off for some time with creating a
lightweight "humane text" parser with Ragel.  Here's what I've got.

%%{
machine simtex;

	newline = "\n" @ { curline++; newline(); };

	text = (print -- newline)* >{ markTextStart(fpc);} %
{ captureText(fpc);};

    comment_to_eol = "#!" %{ fire(Token.Type.COMMENT);} ;

    heading = "%" {1,5} % { fire(Token.Type.HEADING); };
    separator = "-" {5,} % { fire(Token.Type.SEPARATOR); };

    set = "*" {1,5} @ { fire(Token.Type.SET); };
    list = ([0-9]+ . ".") {1} @ { fire(Token.Type.LIST); };

    uninterpolated_command_start = "$<{";
    uninterpolated_command_end = "}>$";

    stanzas = (comment_to_eol | heading | separator | set | list);

	main := (stanzas? . text . newline+ )*;
}%%

I've tried various permutations, with some success, but this is the
configuration that has yielded the most results.  I'm attempting to
parse text in two passes, with the first pass catching all line-based
formatting and the second catching any inter-line formatting.

At this point, I'm just looking for some more general explanation of
best practices for how to write parsers like this in Ragel.  I've
followed the examples, and each time I come back to this hobby project
I understand a little more, but I figured I'd ask if people have
general tips (or more full fledged examples) of how to write these
sorts of parsers in Ragel.

I've looked around for other examples of wiki-text parsers and what
not, but they're generally very complicated and very lengthy, which
seem to go against the elegance that Ragel embodies.

Anyway, thanks for any help you might be able to give,