[ragel-users] Default actions that leave the machine
Murray Henderson
mail at murrayh.id.au
Wed Feb 2 22:17:46 UTC 2011
Hi Adrian,
Thanks for taking an interest :-).
As far as I can tell,
main = (
('HELLO ' $^parse_error) 'WORLD' |
any*
);
and
main = (
('HELLO ' $!parse_error) 'WORLD' |
any*
);
are equivalent to
main = any*;
Anyway, the real machine I am trying to build currently looks like this:
doctype_single_quoted_value = (
"'" ([^>]*)
>start_token_value
%end_token
:>> "'"
);
doctype_double_quoted_value = (
'"' ([^>]*)
>start_token_value
%end_token
:>> '"'
);
doctype_quoted_value = (doctype_single_quoted_value |
doctype_double_quoted_value);
doctype_name = (
space+ (any - ('>' | space))+
>start_token_doctype_name
%end_token
);
doctype_public = space+ 'PUBLIC' %token_doctype_public space+
doctype_quoted_value;
doctype_system = space+ 'SYSTEM' %token_doctype_system space+
doctype_quoted_value;
doctype = (
'<!DOCTYPE' %token_doctype space* (doctype_name doctype_public?
doctype_system?)? space* '>'
);
This machine looks about right (in the FSM diagram) except that it
doesn't handle malformed doctypes.
With the $^^ operator I described, I imagine the machine would look
like this (given a parse error action, pe):
doctype = (
'<!DOCTYPE' %token_doctype space* ((doctype_name doctype_public?
doctype_system?) $^^pe)? space* <: ([^>]+ >pe)? '>'
);
Additionally, I think I might be able to use that imaginary operator
to make whitespace optional (though with a parse error if the
whitespace is omitted):
eg:
omittable_space = space+ >^^pe;
doctype_public = omittable_space 'PUBLIC' %token_doctype_public
omittable_space doctype_quoted_value;
I will be using this machine inside multiple scanners, so goto based
error recovery would be a pain. Default actions that transition to the
final state seem like a handy feature for any permissive parser
(although I realize I am doing something extreme).
I still thinking about attempting to patch ragel. Much more
complicated than I thought it would be, but can't hurt for me to give
it a crack.
Still absolutely nowhere near finished, but my work is progressing slowly ;-).
https://github.com/murrayh/html5rl/blob/master/html5_grammar.rl
Cheers,
Murray
On Tue, Feb 1, 2011 at 5:16 PM, Adrian Thurston <thurston at complang.org> wrote:
> Hi, does this do what you want?
>
> main = (
> ('HELLO ' $^parse_error) 'WORLD' |
> any*
> );
>
> I'm not sure how that fits into your overall plan. Try it out and we'll
> discuss further.
>
> Regards,
> Adrian
>
> On 11-01-31 03:50 PM, Murray Henderson wrote:
>>
>> Hello,
>>
>> Both local and global error actions transition to the error state. I
>> am using Ragel 6.5. I can try with 6.6 when I get home.
>>
>> I made a quick example (based off S. Geist's example):
>>
>> http://pastebin.com/06ihRxQg
>>
>> Example output:
>>
>> HELLO WORLD
>> read: HELLO WORLD
>> len: 12, state: 12
>> HELWORLD
>> parse error
>> read: HEL
>> len: 3, state: 0
>>
>>
>> Cheers,
>> Murray
>>
>>
>> On Tue, Feb 1, 2011 at 10:02 AM, Adrian Thurston<thurston at complang.org>
>> wrote:
>>>
>>> Local error actions don't. Sorry I should have suggested just those.
>>>
>>> On 11-01-31 02:58 PM, Murray Henderson wrote:
>>>>
>>>> Hello,
>>>>
>>>> Local and global error actions transition to the error state.
>>>>
>>>> I want DEF to transition to the next machine (ie. behave like a final
>>>> state), not the error state.
>>>>
>>>> The parser I am writing is permissive, all input must be accepted (I
>>>> never want to goto the error state).
>>>>
>>>> I do not wish to use manual goto recovery, because the parser is large
>>>> and complex, such manual tracking is a lot of work and error prone.
>>>>
>>>> Cheers,
>>>> Murray
>>>>
>>>>
>>>>
>>>> On Tue, Feb 1, 2011 at 4:58 AM, Adrian Thurston
>>>> <adrian.thurston at esentire.com> wrote:
>>>>>
>>>>> Hi, have you looked at ragel's local and global error actions yet?
>>>>> These
>>>>> may
>>>>> do what you want.
>>>>>
>>>>> -Adrian
>>>>>
>>>>> On 11-01-26 08:08 PM, Murray Henderson wrote:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I want to embed a default action into a machine that leaves the
>>>>>> machine (without using manual a jump inside the action).
>>>>>>
>>>>>> For simplicities sake, I will call this operator $^^ (since it is
>>>>>> similar to the Local Error operator).
>>>>>>
>>>>>>
>>>>>> Example:
>>>>>>
>>>>>> action parse_error {}
>>>>>> helloworld = ('HELLO ' %^^parse_error) 'WORLD';
>>>>>>
>>>>>> Non-error inputs include:
>>>>>> HELLO WORLD
>>>>>> HELLOWORLD (parse_error action occurs on 'O' -> 'W' transition)
>>>>>> HELLWORLD (parse_error action occurs on 'L' -> 'W' transition)
>>>>>> HELWORLD (parse_error action occurs on 'L' -> 'W' transition)
>>>>>> HEWORLD (parse_error action occurs on 'E' -> 'W' transition)
>>>>>> HWORLD (parse_error action occurs on 'H' -> 'W' transition)
>>>>>> WORLD (parse_error action occurs on -> 'W' transition)
>>>>>>
>>>>>>
>>>>>> I can simulate the above behavior with the '?' operator, but that is
>>>>>> laborious, and there are other ways of using $^^ that I suspect cannot
>>>>>> be simulated.
>>>>>>
>>>>>>
>>>>>> I want this operator because I am trying to make a liberal parser that
>>>>>> accepts all possible input. (Every state must have a default action)
>>>>>> .I am creating a html5 parser that uses regular machines for
>>>>>> tokenizing, and scanners built from the regular machines for parsing.
>>>>>> Yes, I am mad.
>>>>>>
>>>>>> I cannot use manual jumps, because I don't want to jump out of the
>>>>>> scanners mid-token.
>>>>>>
>>>>>>
>>>>>> I am willing to try and add this operator into Ragel myself. I have
>>>>>> grabbed the source code and tracked my way to fsmap.cpp, where the new
>>>>>> operator would be added.
>>>>>>
>>>>>> Before I continue...
>>>>>> Is there already a way to achieve my desired behavior that I am not
>>>>>> aware
>>>>>> of?
>>>>>> Would such an operator be worthwhile? Is it even possible?
>>>>>> Is there any knowledge that could be imparted that would help me make
>>>>>> a
>>>>>> patch?
>>>>>>
>>>>>> If I do end up making a patch, for symmetry purposes I will make
>>>>>> global/local and start/any/final etc versions of the operator.
>>>>>>
>>>>>> After a brief look through the source, it looks like I would need to
>>>>>> mod the FsmAp::fillGaps() function, passing in a (separate object for
>>>>>> each?) final state into the FsmAp::attachNewTrans() instead of NULL.
>>>>>>
>>>>>> Ragel is a wonderful program by the way, thank you for creating it.
>>>>>>
>>>>>> Cheers,
>>>>>> Murray
>>>>>>
>>>>>> _______________________________________________
>>>>>> ragel-users mailing list
>>>>>> ragel-users at complang.org
>>>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ragel-users mailing list
>>>>> ragel-users at complang.org
>>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>>>
>>>>
>>>> _______________________________________________
>>>> ragel-users mailing list
>>>> ragel-users at complang.org
>>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>
>>> _______________________________________________
>>> ragel-users mailing list
>>> ragel-users at complang.org
>>> http://www.complang.org/mailman/listinfo/ragel-users
>>>
>>
>> _______________________________________________
>> ragel-users mailing list
>> ragel-users at complang.org
>> http://www.complang.org/mailman/listinfo/ragel-users
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users
>
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users
More information about the ragel-users
mailing list