From: Tero Koskinen on
Hi,

Another Monotone user and (occasional) developer here.

On 08/02/2010 07:21 PM, Marcelo Cora�a de Freitas wrote:
> On 1 ago, 07:13, Ludovic Brenta<ludo...(a)ludovic-brenta.org> wrote:
>> This looks like the memory leak that was fixed in monotone version 0.27,
>> back in 2006. And the latest version, 0.48, has additional performance
>> improvements for very large trees.
>
> I am not quite sure we are talking about the same bug. This has been
> detected in 2009/2010 and the Edward O'Callaghan from AuroraUX was in
> touch with one of the monotone developers. I don't know exactly what
> happened as I had to focus on other things.

I am not sure is the poor clone/pull performance an actual bug.

It could be related to the way Monotone is designed.

First, different sections/layers of Monotone have abstracted
interfaces. For example, you can swap network protocol or
internal change set/database implementation with another and
leave other parts untouched. This might affect performance,
although in theory the impact should be small.

Second, by default, Monotone uses backward deltas for change sets.
This means that only the files in the latest revision are stored
in their full form. Other revisions are recorded just as difference
from their child revision. To build 10th newest revision (head-9)
you need to apply deltas from head-1 to head-9 on top of head.
And to build the first revision, you need to apply all deltas.
The recent versions of Monotone also support forward deltas, but
they need to be turned on explicitly, because then the disk space
usage increases.

Third, before sending or receiving anything, Monotone calculates
sha1 checksums for the data. I am not totally sure, but I think
that when you request full clone of a repository, Monotone
constructs each revision (using those backward deltas) from first
to latest and calculates checksums for them. This ensures
data integrity but also takes somewhat long time. There have
been experiments where the checksum calculation/checking has been
removed and it has shown performance boost, but developers prefer
to have those checks on. That checking has revealed data corruption
and disk failures on real systems before operating system's own
mechanisms have detected anything.

Some references:
6 minute pull CPU time for AuroraUX with no checking:
http://colabti.org/irclogger/irclogger_log/monotone?date=2010-02-12#l7
(No idea what was the actual wall clock time.)

20% increase in pull time for AuroraUX:
http://colabti.org/irclogger/irclogger_log/monotone?date=2010-02-09#l147
(This improvement is in 0.47 or 0.48)


-Tero
From: Randy Brukardt on
"Karel Miklav" <karel(a)lovetemple.net> wrote in message
news:i2ve9u$1qg2$1(a)adenine.netfront.net...
> deadlyhead wrote:
>> For those who like Monotone: is using a "real database" really that
>> much of advantage?
>
> Where the hell this meme still draws the power from? I have not seen a
> real application with a bunch of files at its core for a long time and I
> do not really believe you are not aware of advantages of database over the
> filesystem.

Well, let's see. Pretty much every Ada compiler uses the file system rather
than some sort of database at its core. Would you claim that GNAT is not a
"real application"??

Personally, I think there is way too much use of "real databases" for
programs that will find no real advantage to them. When you use an SQL
database as part of your program, you are introducing communcations issues,
a second programming language (with all of the problems that mixed language
programming entails), an additional large piece of software that needs
separate maintenance and patching (many attacks occur through databases),
and huge waste of resources. The latter isn't as important as it once was (I
still think any program over 4 megabytes is amazingly bloated; given that
most computers have more than 1GB of memory, that probably isn't quite
true), but the other factors still are true.

There clearly are applications that benefit from a database (as when there
is actually data that needs to be searched/indexed), but for most
applications (especially the smaller ones), the costs of a database outweigh
the benefits.

Randy.


From: nobody on
I like CM/Synergy. It was actually the first one I came in contact with
and it has everything I need and then some.

One special thing that I miss in all the others I have tried is
task-based cm.
The main thing that give me is the ability to check out files into a
task and when the fix or new function is finished you just say complete
task and all the files affected are checked in for you.
You can naturally have many tasks active in parallell and updating your
working project is based on what tasks have been completed.