[ragel-users] Priority issues when doing a street name parser

Adrian Thurston thurston at complang.org
Thu Sep 24 01:53:25 UTC 2009


Hi William,

I think what you need is a traditional lexer. See section 6.3 of the manual.

-Adrian

William Lachance wrote:
> Hi,
> 
> I'm trying to construct a parser for street addresses using Ragel.
> That is to say, a machine that will take a free form address like
> "5553 Barrington Street NW" and parse out the individual components
> (street number, name, suffix, direction). Everything was going
> swimmingly until I started to try to add support for street names with
> multiple tokens in them (e.g. "Bella Vista Avenue NW")
> 
> Right now my main machine looks like this:
> 
> streetNumber = (digit+ >getStartStr %endNumber);
> streetName = (alpha+ (space+ alpha+)*) >getStartStr %endName;
> suffixFull = space+ suffix
> dirFull = space+ direction
> main := (streetNumber alpha? space+)? streetName suffixFull? dirFull?
> 
> The suffix and dir expressions are really long and boring
> concatenations like this:
> 
> directionWest = ("w"i|"west"i) >getStartStr %endDirWest;
> 
> Anyway, the problem with this simple regular expression is that it
> doesn't give up on parsing the streetName when it begins parsing the
> direction and suffix. So in the above example, it will correctly parse
> "Bella Vista", but then overwrite it with "Avenue", and later "NW". I
> thought that perhaps adding a few ":>>"'s (to stop the processing of
> the streetname when suffixes and directions appear) would help:
> 
> main := (streetNumber alpha? space+)? streetName :>> suffixFull? :>> dirFull? 0;
> 
> Unfortunately, that seems to have the side effect of terminating
> parsing of the street name prematurely (bringing us back to square
> one).
> 
> It _seems_ like what I'm doing should be straightforward. Basically
> the rule should be: "keep on parsing the street until you find a token
> that unambiguously matches a suffix and/or direction; at that point,
> stop, only keeping the previous tokens". Surely there's a way of
> expressing that in Ragel?
> 




More information about the ragel-users mailing list