Edward Thomson [Fri, 19 Jun 2015 15:32:26 +0000 (08:32 -0700)]
diff: preserve original mode in the index
When updating the index during a diff, preserve the original mode,
which prevents us from dropping the mode to what we have interpreted
as on our system (eg, what the working directory claims it to be,
which may be a lie on some systems.)
index: make relative comparison use the checksum as well
This is used by the submodule in order to figure out if the index has
changed since it last read it. Using a timestamp is racy, so let's make
it use the checksum, just like we now do for reloading the index itself.
When ticking over one second, it can happen that the actual time ticks
over the same second between the time that we undermine our own race
protections and the time in which we perform the index update. Such
timing would make the time in the entries match the index' timestamp and
we have not gained anything.
Ticking over five seconds makes it so that if real-time rolls over that
second, our index is still ahead. This is still suboptimal as we're
dealing with timing, but five seconds should be long enough for any
reasonable test runner to finish the tests.
index: use the checksum to check whether it's been modified
We currently use a timetamp to check whether an index file has been
modified since we last read it, but this is racy. If two updates happen
in the same second and we read after the first one, we won't detect the
second one.
Instead read the SHA-1 checksum of the file, which are its last 20 bytes which
gives us a sure-fire way to detect whether the file has changed since we
last read it.
As we're now keeping track of it, expose an accessor to this data.
Marius Ungureanu [Fri, 19 Jun 2015 09:53:37 +0000 (12:53 +0300)]
Quote LIBSSH2_LIBRARIES call
Credits to @directhex
It is possible for PKG_CHECK_MODULES(LIBSSH2 libssh2) to LIBSSH2_LIBRARIES to a string with more than one library in it - e.g. if your libssh2 was built against libgcrypt, it will be "ssh2;gcrypt"
Quoting the string is needed, or CHECK_LIBRARY_EXISTS will fail.
Edward Thomson [Tue, 16 Jun 2015 21:23:12 +0000 (17:23 -0400)]
checkout: allow workdir to contain checkout target
When checking out some file 'foo' that has been modified in the
working directory, allow the checkout to proceed (do not conflict)
if 'foo' is identical to the target of the checkout.
tests: tick the index when we count OID calculations
These tests want to test that we don't recalculate entries which match
the index already. This is however something we force when truncating
racily-clean entries.
Tick the index forward as we know that we don't perform the
modifications which the racily-clean code is trying to avoid.
crlf: tick the index forward to work around racy-git behaviour
In order to avoid racy-git, we zero out the file size for entries with
the same timestamp as the index (or during the initial checkout). This
is the case in a couple of crlf tests, as the code is fast enough to do
everything in the same second.
As we know that we do not perform the modification just after writing
out the index, which is what this is designed to work around, tick the
mtime of the index file such that it doesn't agree with the files
anymore, and we do not zero out these entries.
If a file entry has the same timestamp as the index itself, it is
considered racily-clean, as it may have been modified after the index
was written, but during the same second. We take extra steps to check
the contents, but this is just one part of avoiding races.
For files which do have changes but have not been updated in the index,
updating the on-disk index means updating its timestamp, which means we
would no longer recognise these entries as racy and we would trust the
timestamp to tell us whether they have changed.
In order to work around this, git zeroes out the file-size field in
entries with the same timestamp as the index in order to force the next
diff to check the contents. Do so in libgit2 as well.
We update the index and then immediately change the contents of the
file. This makes the diff think there are no changes, as the timestamp
of the file agrees with the cached data. This is however a bug, as the
file has obviously changed contents.
The test is a bit fragile, as it assumes that the index writing and the
following modification of the file happen in the same second, but it's
enough to show the issue.
Arguably all uses of readdir_r are unnecessary, but in this case
especially so, as the directory handle only exists within this function,
so we don't race with anybody.
Edward Thomson [Tue, 26 May 2015 00:03:59 +0000 (20:03 -0400)]
diff: introduce binary diff callbacks
Introduce a new binary diff callback to provide the actual binary
delta contents to callers. Create this data from the diff contents
(instead of directly from the ODB) to support binary diffs including
the workdir, not just things coming out of the ODB.
These tests were not being taken into consideration for the failure of
the test. They've been failing for a while now, but we hadn't noticed as
Travis was reporting the builds successful.
The read and write callbacks passed to SSLSetIOFuncs() have been
rewritten to match the implementation used on opensource.apple.com and
other open source projects like VLC.
This change also fixes a bug where the read callback could get into
an infinite loop when 0 bytes were read.
Some tools create multiple author fields. git is rather lax when parsing
them, although fsck does complain about them. This means that they exist
in the wild.
As it's not too taxing to check for them, and there shouldn't be a
noticeable slowdown when dealing with correct commits, add logic to skip
over these extra fields when parsing the commit.
object: correct the expected ID size in prefix lookup
We take in a possibly partial ID by taking a length and working off of
that to figure out whether to just look up the object or ask the
backends for a prefix lookup.
Unfortunately we've been checking the size against `GIT_OID_HEXSZ` which
is the size of a *string* containing a full ID, whereas we need to check
against the size we can have when it's a 20-byte array.
Change the checks and comment to use `GIT_OID_RAWSZ` which is the
correct size of a git_oid to have when full.
The way we currently do it depends on the subtlety of strlen vs sizeof
and the fact that .pack is one longer than .idx. Let's use a git_buf so
we can express the manipulation we want much more clearly.
merge: actually increment the counts, not the pointers
`merge_diff_list_count_candidates()` takes pointers to the source and
target counts, but when it comes time to increase them, we're increasing
the pointer, rather than the value it's pointing to.
Coverity complains about the git_rawobj ones because we use a loop in
which we keep remembering the old version, and we end up copying our
object as the base, so we want to have the data pointer be NULL.
We've been using `p_ftruncate()` to extend the packfile in order to mmap
it and write the new data into it. This works well in the general case,
but as truncation does not allocate space in the filesystem, it must do
so when we write data to it.
The only way the OS has to indicate a failure to allocate space is via
SIGBUS which means we tried to write outside the file. This will cause
everyone to crash as they don't expect to handle this signal.
Switch to using `p_lseek()` and `p_write()` to extend the file in a way
which tells the filesystem to allocate the space for the missing
data. We can then be sure that we have space to write into.
clone: fall back to copying when linking does not work
We use heuristics to make a decent guess at when we can save time and
space by linking object files during a clone. Unfortunately checking the
device id isn't enough, as those would be the same during e.g. a bind-mount,
but the OS still does not allow us to link between mounts of the same
filesystem.
If we fail to perform the links, fall back to copying the contents into
a new file as a last attempt.
A remote's URLs are now modified according to the url.*.insteadOf
and url.*.pushInsteadOf configurations. This allows a user to
replace URL prefixes by setting the corresponding keys. E.g.
"url.foo.insteadOf = bar" would replace the prefix "bar" with the
new prefix "foo".
Edward Thomson [Fri, 29 May 2015 20:56:38 +0000 (16:56 -0400)]
git__tolower: a tolower() that isn't dumb
Some brain damaged tolower() implementations appear to want to
take the locale into account, and this may require taking some
insanely aggressive lock on the locale and slowing down what should
be the most trivial of trivial calls for people who just want to
downcase ASCII.
Edward Thomson [Fri, 29 May 2015 20:07:51 +0000 (16:07 -0400)]
git__strcasecmp: treat input bytes as unsigned
Treat input bytes as unsigned before doing arithmetic on them,
lest we look at some non-ASCII byte (like a UTF-8 character) as a
negative value and perform the comparison incorrectly.
Edward Thomson [Thu, 28 May 2015 19:26:13 +0000 (15:26 -0400)]
Rename GIT_EMERGECONFLICT to GIT_ECONFLICT
We do not error on "merge conflicts"; on the contrary, merge conflicts
are a normal part of merging. We only error on "checkout conflicts",
where a change exists in the index or the working directory that would
otherwise be overwritten by performing the checkout.
This *may* happen during merge (after the production of the new index
that we're going to checkout) but it could happen during any checkout.