tuning/optimizing scanners
Chuck Remes
cremes.devl... at mac.com
Fri Oct 5 15:47:25 UTC 2007
I've written a log parsing tool using ragel and ruby. I'm using the
scanner construct to perform the parsing, but things appear to be
running very slowly. I fear I may have chosen the wrong methodology
to parse the log. (And yes, I know ruby isn't the quickest language
out there...) :-)
The log in question is a set of key/value pairs that look like this
(this is one line):
Oct 1 09:50:33.37204 [29193]: {market = ICE | type = order |
order_id = 4 | buy = 1 | price = 80.83 | volume = 1 | date =
2007-10-01 | time = 09:50:33.37201 | metadata = {l={f=Quote|g=4|j=1|
sid=8290182729}|ac=289182|cf=2881|ca= 289182}}
I'm uninterested in the date and other data at the line start, so I
throw it away. I primarily search for the key (e.g. 'market = ') and
then fgoto another machine to parse the value. Upon hitting a pipe
character, I fgoto main again and look for another key. I pasted in a
section of the machine below to illustrate.
Is this the correct approach? Is there a superior method for rapidly
parsing long text strings? Be gentle with me... I'm new to this stuff.
Unfortunately, each log record is a slightly different format (for a
total of about 15 different formats). I also can't plan on the key/
value pairs showing up in the same order every time.
Any suggestions?
----------- snip here ---------------
feedcode_name = [0-9a-zA-Z\-]+;
numbers = [0-9]+;
#####
feedcode := |*
spaces;
'|' => { fgoto main; };
feedcode_name => { temp[:feedcode] = data[tokstart..tokend-1]; };
any => {puts "ERR: feedcode #{data[tokstart..tokend-1]}"};
*|;
#####
volume := |*
spaces;
'|' => { fgoto main; };
numbers => { temp[:quantity] = data[tokstart..tokend].to_i; };
any => {puts "ERR: volume #{data[tokstart..tokend]}"};
*|;
#####
main := |*
'module = ' => { fgoto module; };
'market = ' => { fgoto market; };
'feedcode = ' => { fgoto feedcode; };
'type = ' => { fgoto type; };
'order_id = ' => { fgoto order_id; };
'buy = ' => { fgoto activity; };
'price = ' => { fgoto price; };
'volume = ' => { fgoto volume; };
'date = ' => { fgoto date; };
'time = ' => { fgoto time; };
( numbers | letters | spaces | '\n' | '{' | '}' | other | any );
*|;
More information about the ragel-users
mailing list