From: Brendan on
At a prior job, people were using std::string to hold arbitrary binary
data, as opposed to vector<char>. I've seen a few people in this group
poopoo that notion, and I'd like to find out what they think the
pitfalls, if any, of using string to hold binary data are. I ask,
because otherwise the code base was pretty high quality, and string
does offer a number of extra search based member functions that vector
does not.

Additionally, is there any strong reason to use unsigned char as
opposed to char to hold binary data where the high order bit might be
set? Again, in practice I've mostly seen char used.

Thanks

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Lance Diduck on
On Apr 22, 1:43 pm, Brendan <catph...(a)catphive.net> wrote:
> At a prior job, people were using std::string to hold arbitrary binary
> data, as opposed to vector<char>. I've seen a few people in this group
> poopoo that notion, and I'd like to find out what they think the
> pitfalls, if any, of using string to hold binary data are. I ask,
> because otherwise the code base was pretty high quality, and string
> does offer a number of extra search based member functions that vector
> does not.
>
> Additionally, is there any strong reason to use unsigned char as
> opposed to char to hold binary data where the high order bit might be
> set? Again, in practice I've mostly seen char used.
>
> Thanks
There is nothing wrong with using std::string to hold binary data.
There is nothing in std::string that assumes any particular text
encoding, ispo facto std::string only holds binary data.

There is a compelling reason to use unsigned -- when doing
comparisions, many processors must "sign extend" char data to
something the size of an int. On intel this is the movsx instruction.
When unsigned, the mov is "zero extend" (Intel movzx). movzx is
typically one cycle less than movsx. If you are not doing compares,
tests, or such on the binary data then it doesnt make a difference.
Also note that basic_string<unsigned char> will not play nice with
cout.


Lance



--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Charles on
Lance Diduck wrote:
> If you are not doing compares,
> tests, or such on the binary data then it doesnt make a difference.

Brendan -

If you _are_ doing tests and depending on the application, you should
consider using the Boost dynamic_bitset library
(http://www.boost.org/doc/libs/1_35_0/libs/dynamic_bitset/dynamic_bitset.html).

--
Chuck


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Marco Manfredini on
Brendan wrote:

> At a prior job, people were using std::string to hold arbitrary binary
> data, as opposed to vector<char>. I've seen a few people in this group
> poopoo that notion, and I'd like to find out what they think the
> pitfalls, if any, of using string to hold binary data are. I ask,
> because otherwise the code base was pretty high quality, and string
> does offer a number of extra search based member functions that vector
> does not.

The standard asserts no complexity bounds to the string operations
(exception: swap). char_traits<> has bounds given, but that isn't
helping here. You may end up with an implementation which has, for
example, very fast inserts, but slow replacements or the other way
round.

>
> Additionally, is there any strong reason to use unsigned char as
> opposed to char to hold binary data where the high order bit might be
> set? Again, in practice I've mostly seen char used.

predictable sorting order maybe or defined behavior on overflow?

--
IYesNo yes=YesNoFactory.getFactoryInstance().YES;
yes.getDescription().equals(array[0].toUpperCase());

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]