[ragel-users] tuning/optimizing scanners
Adrian Thurston
thurs... at cs.queensu.ca
Fri Oct 5 16:13:07 UTC 2007
Hi Chuck,
The parsing methodology looks fine to me. There is no undue backtracking.
What version of Ragel are you using?
-Adrian
Chuck Remes wrote:
> I've written a log parsing tool using ragel and ruby. I'm using the
> scanner construct to perform the parsing, but things appear to be
> running very slowly. I fear I may have chosen the wrong methodology
> to parse the log. (And yes, I know ruby isn't the quickest language
> out there...) :-)
>
> The log in question is a set of key/value pairs that look like this
> (this is one line):
>
> Oct 1 09:50:33.37204 [29193]: {market = ICE | type = order |
> order_id = 4 | buy = 1 | price = 80.83 | volume = 1 | date =
> 2007-10-01 | time = 09:50:33.37201 | metadata = {l={f=Quote|g=4|j=1|
> sid=8290182729}|ac=289182|cf=2881|ca= 289182}}
>
> I'm uninterested in the date and other data at the line start, so I
> throw it away. I primarily search for the key (e.g. 'market = ') and
> then fgoto another machine to parse the value. Upon hitting a pipe
> character, I fgoto main again and look for another key. I pasted in a
> section of the machine below to illustrate.
>
> Is this the correct approach? Is there a superior method for rapidly
> parsing long text strings? Be gentle with me... I'm new to this stuff.
>
> Unfortunately, each log record is a slightly different format (for a
> total of about 15 different formats). I also can't plan on the key/
> value pairs showing up in the same order every time.
>
> Any suggestions?
>
> ----------- snip here ---------------
> feedcode_name = [0-9a-zA-Z\-]+;
> numbers = [0-9]+;
>
> #####
> feedcode := |*
> spaces;
>
> '|' => { fgoto main; };
>
> feedcode_name => { temp[:feedcode] = data[tokstart..tokend-1]; };
> any => {puts "ERR: feedcode #{data[tokstart..tokend-1]}"};
> *|;
> #####
> volume := |*
> spaces;
>
> '|' => { fgoto main; };
>
> numbers => { temp[:quantity] = data[tokstart..tokend].to_i; };
> any => {puts "ERR: volume #{data[tokstart..tokend]}"};
> *|;
> #####
> main := |*
> 'module = ' => { fgoto module; };
>
> 'market = ' => { fgoto market; };
>
> 'feedcode = ' => { fgoto feedcode; };
>
> 'type = ' => { fgoto type; };
>
> 'order_id = ' => { fgoto order_id; };
>
> 'buy = ' => { fgoto activity; };
>
> 'price = ' => { fgoto price; };
>
> 'volume = ' => { fgoto volume; };
>
> 'date = ' => { fgoto date; };
>
> 'time = ' => { fgoto time; };
>
> ( numbers | letters | spaces | '\n' | '{' | '}' | other | any );
>
> *|;
>
>
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "ragel-users" group.
> To post to this group, send email to ragel-users at googlegroups.com
> To unsubscribe from this group, send email to ragel-users-unsubscribe at googlegroups.com
> For more options, visit this group at http://groups.google.com/group/ragel-users?hl=en
> -~----------~----~----~----~------~----~------~--~---
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 252 bytes
Desc: OpenPGP digital signature
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20071005/788ebc17/attachment-0001.sig>
More information about the ragel-users
mailing list