Fixing an edge case

Mon Jan 15 16:57:50 UTC 2007

Hi there.

I have come across a case that my "svn diff" parser doesn't handle and
I'm hoping you can help me in solving it.

The situation is this: It is possible (however unlikely) to "svn add"
an empty file to a repository (e.g. touch <file>; svn add <file>). If
you do this the output from "svn diff" contains an imcomplete header
for the diff and no hunk information. Thus instead of something like:

Index: newfile
=====================

--- new file (revision 0)
+++ new file (revision 0)
@@ .... @@
content here

you get:

Index: newfile
=====================

and this can occur in one of two places: either with another diff
following, or as the last diff so that potentially you can get the
output of "svn diff" terminate with:

=====================

which previously would have been considered to be an incomplete diff.
Now it has to be treated as an empty file diff.

Here is my parser machine:

	nbsp = space - '\n' %count_line;
	lineChar = extend - '\n';
	
	diffLine = ( '\\' lineChar* '\n' %count_line ) | ( ' ' | '-' | '+' )
>mark lineChar* '\n' @add_line %count_line;
	
	separator = '='+ '\n' %count_line;
	
	oldFile = '---' lineChar+ '\n' %count_line;
	newFile = '+++' lineChar+ '\n' %count_line;
	
	range = ( '-' | '+' ) ( digit+ >mark %push ) ( ' ' %push_zero @{
fhold; }  | ',' ( digit+ >mark %push ) );
	
	hunkHeader = '@@' nbsp* range nbsp+ range nbsp* '@@' '\n'
@pop_hunk_spec %count_line;
	
	hunkBody = diffLine+;
	
	hunk = hunkHeader >enter_hunk hunkBody %exit_hunk %/exit_hunk;
	
	fileName = ( lineChar+ ) >mark %copy_to_filespec;
	
	fileSpec = "Index:" nbsp+ fileName '\n'+ @count_line;
	
	diffHeader = fileSpec separator;
	
	diffBody = hunk* %exit_diff %/exit_diff;
	
	binaryDiff = 'C' lineChar+ '\n' %count_line lineChar+ '\n'
%binary_diff %count_line;
	
	textDiff = oldFile newFile diffBody;
	
	diff = diffHeader >enter_diff ( binaryDiff | textDiff );
	
	main := diff* $!error;

This machine uses the fact that a diff is ended either by another diff
starting (a line not beginning ' ' | '-' | '+') or EOF using a %/
action.

My problem in coming up with a solution has been to come up with
something that works both when a diff follows an 'empty' diff, when an
'empty' diff comes at the end and not, in the process, ending up
generating spurious empty diffs all over the place!

For example I can find the former case with something like:

	emptyDiff = ( any - ['-'|'C'] @{ fhold; } %empty_diff;
	
	diff = diffHeader >enter_diff ( binaryDiff | textDiff | emptyDiff );

the problem comes when I try to detect the empty case, my first
thought being something like:

    emptyDiff = ( ( any - ['-'|'C'] @{ fhold; } ) | zlen ) %empty_diff;

but this kind of formulation (written from memory) ends up with
machines which detect zlen for every diff (as you'd expect once you
graph it).

I've tinkered but my lack of ability to reason out the answer suggests
I am not as comfortable with this as I thought. My next step will be
to draw out what I think the machine should look like and work
backwards to the spec but if anyone can guide my thinking or give me
some pointers that would help immensely. Fixing this edge case is one
of the last things holding up a first public release of the tool.

Many thanks,

Matt

-- 
Matt Mower :: http://matt.blogs.it/