[ragel-users] signed/unsigned portability issue
Adrian Thurston
thurston at complang.org
Sat Nov 2 14:55:50 UTC 2013
We need ragel's internal data structures to match the signedness of the input
array, and sometimes you just need a signed type because you're parsing a
stream of integers.
Perhaps what might be better is defaulting the C alphtype to unsigned char, if
that's the more common case.
-Adrian
On Thu, Oct 24, 2013 at 12:53:12PM -0700, William Ahern wrote:
> On Thu, Oct 24, 2013 at 08:52:17PM +0200, Peter van Dijk wrote:
> > Hello folks,
> >
> > we (PowerDNS) have a small Ragel parser for segmenting and unescaping DNS
> > TXT record data. Some time ago, we expanded the allowed inputs for this
> > parser to the full 8 bit 'extended ASCII' range (which Ragel calls
> > 'extend').
> >
> > This works well on most platforms - but it failed for us on Debian/s390x.
> >
> > After a lot of digging I found that char is unsigned on s390x, while it is
> > signed on amd64, i386 and many other platforms.
> >
> > I have added 'alphtype unsigned char' to our Ragel file. This makes the
> > parser work reliably on both amd64 and s390x (and, hopefully, many other
> > platforms).
> >
> > However, I feel something is wrong. It seems that on s390x, Ragel is
> > mostly confused about the type of char. It generates a parser that treats
> > extend as -128..127, but maps non-ASCII inputs in the 128..255 range. This
> > discrepancy feels like a Ragel issue to me.
> >
> > A much longer version of this story is at
> > https://www.evernote.com/shard/s344/sh/cb968134-4d58-4e46-8b5e-47366a129038/60fafaf56d5a350edf891cf82cefc66d
> >
> > My question: is this a Ragel bug? Regardless of yes/no, is what I did
> > (alphtype unsigned char) the best workaround?
>
> IMHO it would probably be better for Ragel to use unsigned char arithmetic
> for both char and unsigned char. Off the top of my head it even seems like
> Ragel should treat all input as unsigned.
>
> FWIW, I always use unsigned arithmetic, for Ragel and most everything else.
> Signed arithmetic is for mathematical formulas, not bit twiddling and string
> processing. At the very least, it quickly leads to undefined behavior,
> whereas signed->unsigned conversions in C are always well defined.
>
> Does anybody on the list actually use or depend on signed behavior in their
> machines?
>
>
> _______________________________________________
> ragel-users mailing list
> ragel-users at complang.org
> http://www.complang.org/mailman/listinfo/ragel-users
_______________________________________________
ragel-users mailing list
ragel-users at complang.org
http://www.complang.org/mailman/listinfo/ragel-users
More information about the ragel-users
mailing list