[ragel-users] Re: Inline scanner
Carlos Antunes
cmantu... at gmail.com
Thu Jul 5 03:08:36 UTC 2007
Hi Adrian!
Thanks for the idea and code!
I was now able to reduce ragel's memory usage to 330Mbytes with 24212
states. Compilation time is now roughly 2m45s. I'm still adding stuff
so I don't know how things will progress.
In any case, is there any particular reason you resist the
implementation of a "longest match with backtracking" feature? I am
asking because this feature, as you know, is the default in pretty
much any regex lib/app out there. I still think it would be useful in
ragel, without the need to match and call "external" scanners (which
tends to break the continuity of the grammar.)
Thanks!
Carlos
On 7/4/07, Adrian Thurston <thurs... at cs.queensu.ca> wrote:
> Hey Carlos, I think this does what you want. It moves the processing of
> whitespace out of the main machine and should reduce the number of states.
>
> When a whitespace character is seen there is a call to a scanner which
> consumes whitespace. When the whitespace scanner sees non-whitespace it
> holds it and returns. When it sees the end-of-header pattern ('\n' with
> no continuation) it holds the '\n' and returns. This held '\n' is then
> read by the end of header string and the header terminates.
>
> Cheers,
> Adrian
>
> #include <iostream>
> #include <stdlib.h>
> #include <stdio.h>
>
> using namespace std;
>
> %%{
> machine sipws;
> write data;
> }%%
>
> void sipws( char *str )
> {
> char *p = str, *pe = str + strlen(str) + 1;
> int cs;
> int stack[1];
> int top, act;
> char *tokstart, *tokend;
>
> %%{
> ws_scan := |*
> # Consume spaces.
> [ \t]+;
>
> # Consume line continuations
> '\r'? '\n' [ \t]+;
>
> # An end of header. Holds the \n so the end pattern can match.
> '\r'? '\n' => {
> cerr << "returning from ws (done) " << (p-str) << endl;
> fhold; fret;
> };
>
> # Any other character, hold it and return. */
> any => {
> cerr << "returning from ws (cont)" << endl;
> fhold; fret;
> };
> *|;
>
> # A word is any non-whitespace.
> word = [^ \t\r\n]+;
>
> # Whitespace machine: holds the character and jumps to the whitespace
> # scanner for processing.
> ws = [ \t\r\n] @{
> cerr << "going to whitespace " << (p-str) << endl;
> fhold; fcall ws_scan;
> };
>
> # A newline immediately after coming back from the whitespace scanner
> # signifies the end of a header.
> ws_end = ws '\n';
>
> header = [a-z]+ ':' ws? word (ws word)* ws_end;
>
> main := header+ 0;
>
> # Initialize and execute.
> write init;
> write exec;
> }%%
>
> if ( cs < sipws_first_final )
> cerr << "sipws: there was an error at position " << (p-str) << endl;
> };
>
>
> #define BUFSIZE 1024
>
> int main()
> {
> sipws(
> "hr: asdf ljfa ljd\n"
> " cont\n"
> "new:asiei\n"
> );
> return 0;
> }
>
>
>
--
"We hold [...] that all men are created equal; that they are
endowed [...] with certain inalienable rights; that among
these are life, liberty, and the pursuit of happiness"
-- Thomas Jefferson
More information about the ragel-users
mailing list