From: Mladen Gogala on
On Fri, 18 Jun 2010 12:10:05 -0700, Vladimir M. Zakharychev wrote:

> And in addition to what I just wrote consider this: to compute the
> hashes, you will inevitably need to fully read both LOBs. When doing
> byte-wise comparison, you stop reading and comparing at the first
> mismatch, which might be close to the beginning. So hashing and byte-
> wise comparison are equally I/O-intensive only in the worst case when
> both LOBs are of the same length and the very last byte is different, in
> other cases byte-wise comparison is cheaper.

The trick is to store MD5 sums in the same table. That way, you can
compute the hash outside the database and store it together with the
document. Searches and comparisons then become very fast and easy. I
apologize for not elaborating properly. I do agree with everything you
said. Computing MD5 hash on the fly would be suicidal.



--
http://mgogala.byethost5.com
From: Vladimir M. Zakharychev on
On Jun 19, 5:51 am, Mladen Gogala <gogala.mla...(a)gmail.com> wrote:
> On Fri, 18 Jun 2010 12:10:05 -0700, Vladimir M. Zakharychev wrote:
> > And in addition to what I just wrote consider this: to compute the
> > hashes, you will inevitably need to fully read both LOBs. When doing
> > byte-wise comparison, you stop reading and comparing at the first
> > mismatch, which might be close to the beginning. So hashing and byte-
> > wise comparison are equally I/O-intensive only in the worst case when
> > both LOBs are of the same length and the very last byte is different, in
> > other cases byte-wise comparison is cheaper.
>
> The trick is to store MD5 sums in the same table. That way, you can
> compute the hash outside the database and store it together with the
> document. Searches and comparisons then become very fast and easy. I
> apologize for not elaborating properly. I do agree with everything you
> said. Computing MD5 hash on the fly would be suicidal.
>
> --http://mgogala.byethost5.com

Indeed, if you need to compare a new or updated document to a set of
those already in the database and make sure the new document is really
new, nothing can beat precomputed hashes, because you only need to
compute one new hash, which you can even do on the fly while writing
the document to the database via, say, a checksumming stream writer,
and then do a very fast comparison to a set of precomputed hashes
(having a unique index on them will automatically do that for you on
insert or update.) But this is a different task than OP's.
Cryptographic hashes have many uses and I'm actually using them often
(like the case above, or for integrity and authenticity checks, though
in this case hashes should be protected from unauthorized
modifications - enter digital signatures.)

Regards,
Vladimir M. Zakharychev
N-Networks, makers of Dynamic PSP(tm)
http://www.dynamicpsp.com
First  |  Prev  | 
Pages: 1 2 3
Prev: rman issue
Next: Data Pump and GRANTs