speed vs. re2c?

Fri Oct 6 01:02:15 UTC 2006

Adrian,

Let me try to clarify what I'm talking about.  The traditional use of
re2c or Ragel is:

COMPILE TIME: c-compiler(ragel(regex)) -> binary that can parse regex

I am proposing:

COMPILE TIME: c-compiler(ragel) -> library that can generate
regex-parsing code
RUN TIME: ragel-library(regex) -> machine code in memory I can jump to
to parse regex

An API for Ruby would look something like:

myparser = Ragel::Machine.new("number = (
    [0-9]+ $dgt ( '.' @dec [0-9]+ $dgt )?
    ( [eE] ( [+\-] $exp_sign )? [0-9]+ $exp )?
   ) %number;")

myparser.actions["dgt"] = Proc.new { |dgt| puts "DGT: #{dgt}" }

myparser.run(File.open("foo.txt"))

Specifically:

- I can use Ragel from an interpreted language, without having to
compile every pattern with a C compiler (like Mongrel does)

- I can write my actions in the target language

- it's faster than a table-based re engine, like what I would get by
saying: file.read =~ /blah/

Is this more clear?

Thanks,
Josh