[ragel-users] simple parser for #include statements
Adrian Thurston
thurston at colm.net
Wed Apr 25 18:24:18 UTC 2018
End of *every pattern.
On 2018-04-25 14:18, Adrian Thurston wrote:
> Oh I see. In that case you could use dnl as the default rule, but be
> sure to add it to the end of pattern. That would guarantee achor on
> beginning of line. A question then arises though, do you want to allow
> comments ahead of include statements?
>
> On 2018-04-25 11:37, Mark Olesen wrote:
>> Hi Adrian,
>>
>> Your explanation starts to make some sense. Using 'any' machine
>> instead my 'dnl' machine should be a similar speed (the position of
>> looping and testing for '\n' has just shifted about a bit).
>>
>> However, if I rewrite it:
>>
>> %%{
>> main := |*
>> space*;
>>
>> white* '#' white* 'include' white*
>> (dquot dqarg >buffer %process dquot) dnl;
>>
>> '//' dnl; # 1-line comment
>> '/*' any* :>> '*/'; # Multi-line comment
>>
>> any # Discard
>> *|;
>> }%%
>>
>> How do I ensure that the '#include' is properly anchored? This is what
>> I was attempting with the 'dnl' machine: an attempt to enforce
>> line-based processing, but combined with swallowing multi-line
>> comments.
>>
>> As a regex, I'd specify my match like this
>>
>> /^\s*#\s*include\s+"(.*?)".*$/
>>
>> For my ragel machine, should I be doing something different such as
>> having a begin-of-line state that I initialize into and reset every
>> time I cross a newline?
>> With vague hand waving:
>>
>> %%{
>> main := |*
>>
>> '#' white* 'include' white*
>> (dquot dqarg >buffer %process dquot) dnl;
>>
>> '//' dnl; # 1-line comment
>> '/*' any* :>> '*/'; # Multi-line comment
>>
>> (space %isbol | any %notbol) # Discard
>> *|;
>> }%%
>>
>> Not that I really understand what I'd do next with this.
>>
>> Cheers,
>> /mark
>>
>>
>> On 04/25/18 15:45, Adrian Thurston wrote:
>>> Hi Mark,
>>>
>>> So the thing to remember here is that a scanner will always try for
>>> the longest match possible, and only in the case of matches of equal
>>> length will it choose the pattern that appears ahead of the others.
>>> So in this case the dnl at the end is taking precedence over the
>>> comment rules. It doesn't interfere with the include matching rule
>>> because it also has a dnl at the end.
>>>
>>> For the catch all you want to use just the any machine. It will go
>>> one char at a time and this may seem less efficient, but ragel does
>>> its best to optimize this.
>>>
>>> In regards to the slightly tighter machine that you mentioned, it
>>> would be interesting to see before and after grammars in full to see
>>> what's going on. On their own they produce the same machine, but in
>>> the context of something larger there might be something preventing
>>> it, or it could be a missed opportunity for optimization.
>>>
>>> -Adrian
>>
>> _______________________________________________
>> ragel-users mailing list
>> ragel-users at colm.net
>> http://www.colm.net/cgi-bin/mailman/listinfo/ragel-users
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at colm.net
> http://www.colm.net/cgi-bin/mailman/listinfo/ragel-users
More information about the ragel-users
mailing list