Vicent Marti [Mon, 28 Feb 2011 14:51:17 +0000 (16:51 +0200)]
Implement reference counting for git_objects
All `git_object` instances looked up from the repository are reference
counted. User is expected to use the new `git_object_close` when an
object is no longer needed to force freeing it.
Vicent Marti [Mon, 28 Feb 2011 10:46:13 +0000 (12:46 +0200)]
Fix searching in git_vector
We now store only one sorting callback that does entry comparison. This
is used when sorting the entries using a quicksort, and when looking for
a specific entry with the new search methods.
nulltoken [Mon, 28 Feb 2011 11:33:47 +0000 (12:33 +0100)]
Added copydir_recurs() to test_helpers.c
Test helper function which recursively copies the content of a
directory. This function has been tweaked to prevent stack overflows by
reusing the same path buffers on all recursive calls.
Vicent Marti [Thu, 24 Feb 2011 21:53:40 +0000 (23:53 +0200)]
Fix file renaming in MinGW
We now use MoveFileEx, which is not assured to be atomic but works for
always (both if the destination exists, or if it doesn't) and is
available in MinGW.
Since this is a Win32 API call, complaint about lost or overwritten files
should be forwarded at Steve Ballmer.
Vicent Marti [Thu, 24 Feb 2011 19:43:08 +0000 (21:43 +0200)]
Fix renaming of files in Win32
The `rename` call doesn't quite work on Win32: expects the destination
file to not exist. We're using a native Win32 call in those cases --
that should do the trick.
Vicent Marti [Tue, 22 Feb 2011 19:59:36 +0000 (21:59 +0200)]
Rewrite git_hashtable internals
The old hash table with chained buckets has been replaced by a new one
using Cuckoo hashing, which offers guaranteed constant lookup times.
This should improve speeds on most use cases, since hash tables in
libgit2 are usually used as caches where the objects are stored once and
queried several times.
The Cuckoo hash implementation is based off the one in the Basekit
library [1] for the IO language, but rewritten to support an arbritrary
number of hashes. We currently use 3 to maximize the usage of the nodes pool.
Vicent Marti [Mon, 21 Feb 2011 15:05:16 +0000 (17:05 +0200)]
Rewrite all file IO for more performance
The new `git_filebuf` structure provides atomic high-performance writes
to disk by using a write cache, and optionally a double-buffered scheme
through a worker thread (not enabled yet).
Writes can be done 3-layered, like in git.git (user code -> write cache
-> disk), or 2-layered, by writing directly on the cache. This makes
index writing considerably faster.
The `git_filebuf` structure contains all the old functionality of
`git_filelock` for atomic file writes and reads. The `git_filelock`
structure has been removed.
Additionally, the `git_filebuf` API allows to automatically hash (SHA1)
all the data as it is written to disk (hashing is done smartly on big
chunks to improve performance).
Vicent Marti [Fri, 18 Feb 2011 08:29:55 +0000 (10:29 +0200)]
Disable threaded index writing by default
The interlocking on the write threads was not being done properly (index
entries were sometimes written out of order). With proper interlocking,
the threaded write is only marginally faster on big index files, and
slower on the smaller ones because of the overhead when creating
threads.
The threaded index writing has been temporarily disabled; after more
accurate benchmarks, if might be possible to enable it again only when
writing very large index files (> 1000 entries).
Vicent Marti [Thu, 17 Feb 2011 19:32:00 +0000 (21:32 +0200)]
Improve the performance when writing Index files
In response to issue #60 (git_index_write really slow), the write_index
function has been rewritten to improve its performance -- it should now
be in par with the performance of git.git.
On top of that, if Posix Threads are available when compiling libgit2, a
new threaded writing system will be used (3 separate threads take care
of solving byte-endianness, hashing the contents of the index and
writing to disk, respectively). For very long Index files, this method
is up to 3x times faster than git.git.
Vicent Marti [Wed, 9 Feb 2011 17:49:02 +0000 (19:49 +0200)]
Internal changes on the backend system
The priority value for different backends has been removed from the
public `git_odb_backend` struct. We handle that internally. The priority
value is specified on the `git_odb_add_alternate`.
This is convenient because it allows us to poll a backend twice with
different priorities without having to instantiate it twice.
We also differentiate between main backends and alternates; alternates have
lower priority and cannot be written to.
These changes come with some unit tests to make sure that the backend
sorting is consistent.
The libgit2 version has been bumped to 0.4.0.
This commit changes the external API:
CHANGED:
struct git_odb_backend
No longer has a `priority` attribute; priority for the backend
in managed internally by the library.
git_odb_add_backend(git_odb *odb, git_odb_backend *backend, int priority)
Now takes an additional priority parameter, the priority that
will be given to the backend.
ADDED:
git_odb_add_alternate(git_odb *odb, git_odb_backend *backend, int priority)
Add a backend as an alternate. Alternate backends have always
lower priority than main backends, and writing is disabled on
them.
Vicent Marti [Wed, 9 Feb 2011 10:46:54 +0000 (12:46 +0200)]
Honor alternate entries in the ODB
The alternates file is now parsed, and the alternate ODB folders are
added as separate backends. This allows the library to efficiently query
the alternate folders.
Unfortunately previous commit was only a partial fix, because it broke
SQLite support on platforms w/o pkg-config, e.g. Windows. To be honest
I just forgot about messy Windows.
Now if there is no pkg-config, then user must provide two variables:
SQLITE3_INCLUDE_DIRS and SQLITE3_LIBRARIES if (s)he wants to use SQLite
backend. These variables are added to cmake-gui for her/his convenience
unless they are set by FindPkgConfig module.
FindPkgConfig obviously uses pkg-config's output for setting convenient
variables such as <PREFIX>_LIBRARIES or <PREFIX>_INCLUDE_DIRS. It also
sets <PREFIX>_FOUND to 1 if <PREFIX> module exists.
So why checking for SQLITE3_FOUND is better than (SQLITE3_LIBRARIES AND
SQLITE3_INCLUDE_DIRS)? Apart from obvious readability factor, latter
condition has strong assumption that both variables are filled with
appropriate paths, which is unjustifiable unless you add another
assumptions...
pkg-config by default strips -I/usr/include from Cflags and -L/usr/lib
from Libs if some environment variables are not set,
PKG_CONFIG_ALLOW_SYSTEM_CFLAGS and PKG_CONFIG_ALLOW_SYSTEM_LIBS
respectively. This behavior is sane, because it prevents polluting the
compilation and linking commands with superfluous entries.
In debian SQLITE3_INCLUDE_DIRS is empty for instance.
Remark for developers:
Always check commands invoked by CMake after changing CMakeLists.txt.
Vicent Marti [Mon, 7 Feb 2011 16:25:42 +0000 (18:25 +0200)]
Git trees are now always lazily sorted
Removed `git_tree_add_entry_unsorted`. Now the `git_tree_add_entry`
method doesn't sort the entries array by default; entries are only
sorted lazily when required. This is done automatically by the library
(the `git_tree_sort_entries` call has been removed).
This should improve performance. No point on sorting entries all the time, anyway.