[ragel-users] Failed to convert URL parser regular expression to Ragel
徐亮
lxu4net at gmail.com
Mon Jan 9 11:02:11 UTC 2012
I have posted a question in
StackOverflow<http://stackoverflow.com/questions/8784903/failed-to-convert-url-parser-regular-expression-to-ragel>about
it.
I found an URL parser regular expression at RFC 2396 and RFC 3986.
^(([^:\/?#]+):)?(\/\/([^\/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
I converted it to Ragel:
%%{
# RFC 3986 URI Generic Syntax (January 2005)
machine url_parser;
action pchar {
printf("%c", fc);
}
action scheme { printf("scheme\n"); }
action scheme_end { printf("\nscheme_end\n"); }
action authority { printf("authority\n"); }
action authority_end { printf("\nauthority_end\n"); }
action path { printf("path\n"); }
action path_end { printf("\npath_end\n"); }
action query { printf("query\n"); }
action query_end { printf("\nquery_end\n"); }
action fragment { printf("fragment\n"); }
action fragment_end { printf("\nfragment_end\n"); }
scheme = (any - [:/?#])+ >scheme $pchar %scheme_end ;
authority = (any - [/?#])* >authority $pchar %authority_end ;
path = (any - [?#])* >path $pchar %path_end ;
query = (any - [#])* >query $pchar %query_end ;
fragment = (any)* >fragment $pchar %fragment_end ;
main := (( scheme ":" )?) <: (( "//" authority )?) <: path ( "?"
query )? ( "#" fragment )?;
}%%
#include <cstdio>
#include <cstdlib>
#include <string>
/** Data **/
%% write data;
int main(int argc, char **argv) {
std::string str(argv[1]);
char const* p = str.c_str();
char const* pe = p + str.size();
char const* eof = pe;
int cs = 0;
%% write init;
%% write exec;
return p - str.c_str();
}
It's work when I input absolute URI.
liangxu at dev64:~$ ./uri_test "
http://www.ics.uci.edu/pub/ietf/uri/?c=www&rot=1&e=%20%20"
scheme
http
scheme_end
authority
www.ics.uci.edu
authority_end
path
/pub/ietf/uri/
path_end
query
c=www&rot=1&e=%20%20
query_end
And success when I input authority and path:
liangxu at dev64:~$ ./uri_test "//
www.ics.uci.edu/pub/ietf/uri/?c=www&rot=1&e=%20%20"
authority
www.ics.uci.edu
authority_end
path
/pub/ietf/uri/
path_end
query
c=www&rot=1&e=%20%20
query_end
But failed when I input only path:
liangxu at dev64:~$ ./uri_test "/pub/ietf/uri"
What's wrong?
--
Liang Xu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.colm.net/pipermail/ragel-users/attachments/20120109/e8efb89f/attachment-0001.html>
-------------- next part --------------
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users
More information about the ragel-users
mailing list