From: Floyd L. Davidson on
bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:
>Floyd L. Davidson wrote:
>> bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:
>>> Jim Townsend wrote:
>>>> Canon's early RAW images were 12 bit. The newer cameras
>>>> produce 14 bit CRW images. The are saved as 16 bit
>>>> images.
>>> Ah hah!
>> But when "saved as 16 bit images", the data is
>> re-encoded
>> using 16 bit values. It is *never* padded with zeros.
>>
>>>> There are 12 or 14 bits of actual image data and the
>>>> remaining bits are padded with zeros.
>>>> 8 bits = 255 (Binary 11111111)
>>>> An 8 bit color image is composed of three channels X 255
>>>> 16 bits = 65535 (Binary 1111111111111111)
>>>> A 16 bit color image is composed of three channels x 65535
>>>> DCraw might be able to process 16 bit images, but GIMP
>>>> cannot. GIMP converts 16 bit images to 8 bit images
>>>> when you open them. The value of 255 per channel in
>>>> GIMP's histogram is normal for an 8 bit image. I
>>>> don't know why you think this is high. (Open an 8 bit
>>>> JPEG in GIMP and you'll see the histogram still shows
>>>> 0-255 per channel).
>>> Because I was missing the info you gave me!
>>>
>>> I was expecting 12 (or maybe 14 bit) data, UNPADDED,
>>> which would have given me 4 (or maybe 6) bit data after crunching
>>> down to 8 bit in netpbm (which divides everything by 257)
>> That does not make sense. Regardless of conversion to
>> a
>> 16-bit format, when encoded in an 8-bit file you would have
>> 8 bits, not 4 or maybe 6.
>
>I wasn't counting leading zeroes in my bit counts.

There are no leading zeros.

If you convert 8-bit data to 16-bit data, the highest
number is all 1's in _both_ cases! In the 8-bit data
the maximum value is represented by an unsigned integer
value of 0xff (255 decimal). In binary that is 8 bits
which are all 1's. In the 16-bit data the maximum value
is represented by an unsigned integer value of 0xffff
(65535 decimal). And in binary that is 15 bits which
are all 1's.

>*IF* you take my assumption that the true-raw data is 12 bit,
>(i.e. 16 bit with 4 leading zeroes) *AND*

That is not true though. 12-bit data is just that, data
encoded into 12 bit words. It is *NOT* 16 bit words
with 12 bits used, it is 12 bit words! When the data is
store in a file, whether it is 8-bit, 12-bit, 14-bit or
16-bit, it is _streamed_ into the 8 bit octets that
computer files use. It is *not* zero padded.

When that data is transfered from the camera to a
computer it is done using 8-bit octets for the file
format. Each pair of 12-bit data words is sent as 3
each octets, all totallying 24 bits.

>that dcraw preserves this convention *AND* you simply divide
>the 16 bit data by 257 (which *IS* what pamdepth does) you would *expect*
>the resulting file to be 8 bit with 4 leading zeroes, which
>most people would call 4 bit.

But what /dcraw/ does is convert the 12-bit sensor data
into an image format, using either 8-bit or 16-bit data
for the output. There are NO padding bits, and it is NOT
12-bit data at the output.

If you look at the source code for /pamdepth/ you will
discover that it does *not* simply divide by 257 (the
right number would be 256 anyway, not 257). (Granted
that what it does is very close to that!)

Regardless, dividing by 256 would *not* result in 8-bits
with some number of leading zeros. If you did that the
size of the file would remain exactly the same. But if
you look at the results, you'll see that /pamdepth/ will
change a 16-bit file to an 8-bit file with the
consequence that the file size is reduced by just about
50% (not quite because of the metadata overhead, which
is very small).

>It appears that no less than two of my assumptions are wrong;
>The data (I think) in the CRW file is right padded with zero,
>not left, and in any case it appears that dcraw normalises to
>"full" 16 bit.

It is not zero padded at all.

>Given my assumptions, I claim that both my reasoning
>and conclusions were valid.
>
>Shame about the assumptions.

Here, lets look at some real life examples. I can't use
a Canon raw file, but a Nikon one will do the same (or
you can point me at a suitable Canon raw file on the web
somewhere and I'll demonstrate that it works *exactly*
the same way).

I have a file, d2x_8696.nef, that /exiftool/ shows to be
a 12-bit uncompressed NEF formatted data file, and the
image size is 4320x2868, for 12,389,760 pixels. The
file's size on disk is 20,330,936. If it actually has
12.3 million 12-bit data points, that should take up 1.5
times that many bytes (if they are as I said above, with
no zero padding and each 2 12-bit words being divided
into 3 8-bit octets) would require the file to be at
least (1.5 * 12,389,760) bytes in size, and the extra
would be overhead for the thumbnail and metadata. That
works out to 18,584,650, which leaves 1,746,296 for the
overhead. Those are very reasonable numbers, which
suggests that in fact the 12-NEF data is _clearly_ not
zero padded in any way.

Okay, then if I use /dcraw/ to convert the raw data to
PPM formatted image files in both 8-bit and 16-bit formats,
like this:

dcraw -4 -c d2x_8696.nef > 16bit.ppm
dcraw -c d2x_8696.nef > 8bit.ppm

I get files that are respectively 74.2Mb for the 16-bit
file and 37.1Mb for the 8-bit file. Since there are 3
channels in a PPM file (RGB), if we divide those numbers
by 3 we see how many bits per channel there are.
74,200,915 / 3 is 24733638 total bits for each channel
in the 16-bit file. If we divide by 2 (bytes), we get
12,366,819 pixels. Hmmm, you say... The NEF file said
it had 22,944 more sensor locations than that! And
indeed it does, but /dcraw/ generated a 4312x2868 image
in the PPM file, which is 12,366,816 pixels and should
take up 24,733,632 bytes in a 16-bit file. That leaves
a very reasonable number (6 bytes) for the overhead.

Run the numbers for the 8-bit file, and it works out just
exactly the same. Clearly there is no zero padding in
either file.

Then, another 8-bit file can be generated from the 16-bit
file produced by /dcraw/, this time using /pamdepth/:

dcraw -c d2x_8696.nef | pamdepth 255 > 8bitpam.ppm

And it produces a file exactly the same size, 37,100,465
bytes, that /dcraw/ did for an 8-bit output file.

--
Floyd L. Davidson <http://www.apaflo.com/floyd_davidson>
Ukpeagvik (Barrow, Alaska) floyd(a)apaflo.com
From: bugbear on
Floyd L. Davidson wrote:
> bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:
>> Floyd L. Davidson wrote:
>>> bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:
>>>> Jim Townsend wrote:
>>>>> Canon's early RAW images were 12 bit. The newer cameras
>>>>> produce 14 bit CRW images. The are saved as 16 bit
>>>>> images.
>>>> Ah hah!
>>> But when "saved as 16 bit images", the data is
>>> re-encoded
>>> using 16 bit values. It is *never* padded with zeros.
>>>
>>>>> There are 12 or 14 bits of actual image data and the
>>>>> remaining bits are padded with zeros.
>>>>> 8 bits = 255 (Binary 11111111)
>>>>> An 8 bit color image is composed of three channels X 255
>>>>> 16 bits = 65535 (Binary 1111111111111111)
>>>>> A 16 bit color image is composed of three channels x 65535
>>>>> DCraw might be able to process 16 bit images, but GIMP
>>>>> cannot. GIMP converts 16 bit images to 8 bit images
>>>>> when you open them. The value of 255 per channel in
>>>>> GIMP's histogram is normal for an 8 bit image. I
>>>>> don't know why you think this is high. (Open an 8 bit
>>>>> JPEG in GIMP and you'll see the histogram still shows
>>>>> 0-255 per channel).
>>>> Because I was missing the info you gave me!
>>>>
>>>> I was expecting 12 (or maybe 14 bit) data, UNPADDED,
>>>> which would have given me 4 (or maybe 6) bit data after crunching
>>>> down to 8 bit in netpbm (which divides everything by 257)
>>> That does not make sense. Regardless of conversion to
>>> a
>>> 16-bit format, when encoded in an 8-bit file you would have
>>> 8 bits, not 4 or maybe 6.
>> I wasn't counting leading zeroes in my bit counts.
>
> There are no leading zeros.
>
> If you convert 8-bit data to 16-bit data, the highest
> number is all 1's in _both_ cases!

yes; this is achieved by multiplying by 257;
0 maps to 0, 255 maps to 65535, everything else
is in between. Close enough for government work.

Dividing reverses the process.

> In the 8-bit data
> the maximum value is represented by an unsigned integer
> value of 0xff (255 decimal). In binary that is 8 bits
> which are all 1's. In the 16-bit data the maximum value
> is represented by an unsigned integer value of 0xffff
> (65535 decimal). And in binary that is 15 bits which
> are all 1's.
>
>> *IF* you take my assumption that the true-raw data is 12 bit,
>> (i.e. 16 bit with 4 leading zeroes) *AND*
>
> That is not true though. 12-bit data is just that, data
> encoded into 12 bit words. It is *NOT* 16 bit words
> with 12 bits used, it is 12 bit words! When the data is
> store in a file, whether it is 8-bit, 12-bit, 14-bit or
> 16-bit, it is _streamed_ into the 8 bit octets that
> computer files use. It is *not* zero padded.

Most programmers I know would call 16 bit data where are the values
are (in fact) <= 4095 12 bit data in a 16 bit file, or just 12 bit data.

They might draw a distinction between the *representation*
being 16 bit, and the *data* being 12 bit.

Ceratainly if one anticipated generating using a lookup
table, (and didn't want to waste memory) one would want 4096 entries,
not 65536.

No matter; I now know that the raw data is 10 bit,
with padding zeroes on the right, and (further) that
dcraw (in any case) also normalises into a 16 bit space,
no matter how far apart the samples are.

My faulty assumptions have been corrected,
and I know enough to proceed.

BugBear
From: Floyd L. Davidson on
bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:
>Floyd L. Davidson wrote:
>> bugbear <bugbear(a)trim_papermule.co.uk_trim> wrote:
>>> Floyd L. Davidson wrote:
>>>>> I was expecting 12 (or maybe 14 bit) data, UNPADDED,
>>>>> which would have given me 4 (or maybe 6) bit data after crunching
>>>>> down to 8 bit in netpbm (which divides everything by 257)
>>>> That does not make sense. Regardless of conversion to
>>>> a
>>>> 16-bit format, when encoded in an 8-bit file you would have
>>>> 8 bits, not 4 or maybe 6.
>>> I wasn't counting leading zeroes in my bit counts.
>> There are no leading zeros.
>> If you convert 8-bit data to 16-bit data, the highest
>> number is all 1's in _both_ cases!
>
>yes; this is achieved by multiplying by 257;
>0 maps to 0, 255 maps to 65535, everything else
>is in between. Close enough for government work.
>
>Dividing reverses the process.

The point is that there is no zero padding going on, at
all.

>> In the 8-bit data
>> the maximum value is represented by an unsigned integer
>> value of 0xff (255 decimal). In binary that is 8 bits
>> which are all 1's. In the 16-bit data the maximum value
>> is represented by an unsigned integer value of 0xffff
>> (65535 decimal). And in binary that is 15 bits which
type alert: ^^ should be 16

>> are all 1's.
>>
>>> *IF* you take my assumption that the true-raw data is 12 bit,
>>> (i.e. 16 bit with 4 leading zeroes) *AND*
>> That is not true though. 12-bit data is just that,
>> data
>> encoded into 12 bit words. It is *NOT* 16 bit words
>> with 12 bits used, it is 12 bit words! When the data is
>> store in a file, whether it is 8-bit, 12-bit, 14-bit or
>> 16-bit, it is _streamed_ into the 8 bit octets that
>> computer files use. It is *not* zero padded.
>
>Most programmers I know would call 16 bit data where are the values
>are (in fact) <= 4095 12 bit data in a 16 bit file, or just 12 bit data.

That is not what we have though... It is NOT 16 bit
data at any point where the values are limited to a
maximum of 4095. No such format is used in the process.

>They might draw a distinction between the *representation*
>being 16 bit, and the *data* being 12 bit.

But we are *NOT* talking about 12-bit data in a 16-bit
format. The 12-bit data in the raw file is a stream of
12-bit per sensor data. There is no zero padding. It is
not 16 bit data in any way shape or form.

If it were, the files would be significantly larger.

>Ceratainly if one anticipated generating using a lookup
>table, (and didn't want to waste memory) one would want 4096 entries,
>not 65536.

For the 12-bit raw file, yes. (And for a 16-bit file it
requires 65536 elements, but the raw file is *not* a 16-bit
file in any way.

>No matter; I now know that the raw data is 10 bit,
>with padding zeroes on the right, and (further) that

It is *not* using zero padding. That is very clearly obvious
from the file sizes!

>dcraw (in any case) also normalises into a 16 bit space,
>no matter how far apart the samples are.

But /dcraw/ does not do that, ever.

You still are not recognizing that the raw input is a
single channel per sensor format (one 12 bit byte per
sensor) which does not use zero padded data. If it is
10 bits per sensor and there are 5 million sensors, the
size of the file (minus overhead for metadata and a
thumbnail) is 6.55Mb. That is (10 / 8 * Megapixels).
That is 10 bits per sensor, not 16 with 6 of them being
zeros (which would require a 10+Mb file size). If a 5
MP camera has 12-bit data, the file size is 7.5Mb +
overhead (12 / 8 * Megapixels).

The _output_ file is either TIFF or PPM, it is an RGB
format, and hence has 3 data channels for each pixel.
If it is an 8 bit depth format there will be 15,000,000
8 bit bytes in the file for a 5MP image. No zero
padding. If it is in 16 bit depth format, it will be
30,000,000 8 bit bytes, still with no zero padding.

>My faulty assumptions have been corrected,
>and I know enough to proceed.

The faulty assumptions might be cleared up, but it looks
like the faulty conclusions are still hanging around.

--
Floyd L. Davidson <http://www.apaflo.com/floyd_davidson>
Ukpeagvik (Barrow, Alaska) floyd(a)apaflo.com