[ragel-users] Detect keywords with a ragel scanner
thurston at complang.org
thurston at complang.org
Mon Jul 18 01:49:24 UTC 2011
Hi Talek, what you should do is include the tail items in the scanner and add a pattern that covers any word that is not 'select'. If you specify 'select' ahead of the generic pattern it will be matched in favour of the generic pattern on only that word.
Adrian
-----Original Message-----
From: Alec Tica <alexandru.tica at gmail.com>
Sender: ragel-users-bounces at complang.org
Date: Fri, 15 Jul 2011 00:20:42
To: <ragel-users at complang.org>
Reply-To: ragel-users at complang.org
Subject: [ragel-users] Detect keywords with a ragel scanner
Hi,
I'm new to Ragel and I'm trying to figure out how to solve,
apparently, a very simple problem. Let's say I have the following
text:
"select 1 from dual;select 2 from dual;/*comment*/select 3 from dual;select"
I want to detect all "select" keywords using a scanner but taking into
consideration the word boundaries. "select" is a keyword only if:
1. starts at: the very beginning of the text or it has a whitespace
before or a comment or a statement separator (;)
2. ends at: the very end of the text or it has a whitespace after or a
comment or a statement separator (;)
3. is not within quotes
4. is not part of a comment
Till now I have:
<code>
%%{
machine example;
action is_eof {
true if p == eof - 1
}
# eof
EOF = zlen when is_eof;
# strings
squoted_string = ['] ( (any - [''])** ) ['];
dquoted_string = '"' ( any )* :>> '"';
# comments
ml_comment = '/*' ( any )* :>> '*/';
sl_comment = '--' ( any )* :>> ('\n' | EOF);
comment = ml_comment | sl_comment;
tail = space | comment | ';' | EOF;
# keyword
select = 'select' . tail;
main := |*
squoted_string;
dquoted_string;
comment;
select => { puts "found at #{ts}-#{te}" };
any;
*|;
}%%
%% write data;
data = 'unselect 1 from dual;select 2 from dual;/*comment*/select 3
from dual;select'
# convert the provided string in a stream of chars
stream_data = data.unpack("c*") if(data.is_a?(String))
eof = stream_data.length
%% write init;
%% write exec;
</code>
Of course, the above scanner incorrectly matches the "unselect" word
from the data. Anyway, I feel that I'm not on the right track
therefore I'd like to ask for your advice.
Many thanks in advance!
--
talek
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users
More information about the ragel-users
mailing list