Grammar testing proposal
Adrian Thurston
thurs... at cs.queensu.ca
Fri Sep 15 20:06:09 UTC 2006
Colin, great idea. One issue might be specifying language independent
actions. This could get tough if in the future we support non c-like
languages. For example, there was mention of supporting Ruby.
Perhaps TXL (http://www.txl.ca/) might be useful. It could be used to define
a mini toy language and to write transformations to the host languages.
Though I'm connected to that project so I'm biased in regard to it being
appropriate :)
-Adrian
Colin Fleming wrote:
> Hi all,
>
> I've been thinking about various ways to test Ragel and the generated
> grammars, here's what I've come up with. I'm really interested in any
> feedback. I'm currently developing a couple of grammars that I'm
> primarily interested in using with Java. The Java generation is still
> a bit experimental, so I'd like to be able to use acceptance tests
> that confirm that a) the grammar works as expected, b) the results are
> consistent across Java/C++/whatever, and c) that the results are also
> consistent across different code generation strategies.
>
> This last one is probably currently more useful to Adrian than anyone,
> but I'm probably going to reimplement rlcodegen in Java shortly, so it
> will be great for testing that as well as testing code generation
> implementations for any new languages, or new code generation
> strategies.
>
> So, I propose a parser class generator that will take a raw Ragel
> grammar and generate an rl file for whichever of the supported
> languages the user requests. This rl file will generate a basic
> parsing class, with the standard methods: init, execute, finish. The
> Ragel syntax would be slightly extended to specify features of the
> generated class, and these extensions stripped out when the rl file is
> written. This would actually probably be pretty generally useful too,
> a lot of people just want a support class that they can integrate into
> a larger project, I imagine.
>
> The whole point of this thing is testing, so unit test data and
> expected values would be encoded in the source file. Either a test
> class or just the parser could be generated, or both.
>
> An example is worth a thousand words, so here goes:
>
> %%{
> # Variables for the generated class, initialised in init() method
> # public vars generate getters
> public int val = 0;
> private boolean neg = true;
>
> action see_neg {
> neg = true;
> }
>
> action add_digit {
> val = val * 10 + (fc - '0');
> }
>
> main :=
> ( '-'@see_neg | '+' )? ( digit @add_digit )+
> '\n' @{ fbreak; };
>
> test {
> input "1\n";
> output "1";
> }
>
> test {
> input "213 3213\n";
> output "unexpected char ' ' in input";
> failure;
> }
> }%%
>
> Obviously one concern here is overloading the Ragel syntax, maybe a
> prefix would be good to highlight the new keywords as preprocessor
> directives.
>
> A few more thoughts:
>
> It would be good to be able to specify variables of the alphabet type:
> public alphtype character;
>
> It would also be interesting to track the states the machine moves
> through on each run, they could be compared to ensure that the
> different strategies are behaving equally.
>
> I'm also not sure about having the test code in with the actual
> grammar, but I guess an include directive would make that easier.
>
> Any thoughts or ideas?
>
> Cheers,
> Colin
>
>
More information about the ragel-users
mailing list