From: Andrew on
I am working with some legacy code that is in the process of changing
to use std::vector<char> instead of a C-style char array. The C-style
char array is currently allocated using new char [n]. This array is
passed to various C string functions such as strstr, strncmp etc. I
need to do the same work but with a std::vector. I googled around for
a bit to see if I could find anyone who had already done this work but
my search revealed nothing. I wonder if some kind person could point
me in the right direction.

Now, I realise I could code it all myself but surely there must be
something out there where this has already been done. I would rather
build on the work of others than re-invent the wheel. And for
performance critical apps such as the one I am working on, it is
common advice to use std::vector<char> instead of std::string or C-
style char arrays. In the past I often seen this advice given out
(it's even in More Effective STL) but without the utility functions to
back it up I can see people ignoring this advice.

FWIW, the app is reading in sections of a *huge* XML file. A buffer is
used to hold a fragment which is then parsed using the Xerces SAX
parser (thus it avoids creating a DOM object). I want the buffer to be
a std::vector<char> that sometimes expands to reach a new watermark. I
think I've got that bit working but the string compares fail coz it
goes off the end of the vector.

Regards,

Andrew Marlow

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Lance Diduck on

Premature optimization is the root of all evil

> Now, I realise I could code it all myself but surely there must be
> something out there where this has already been done.
Alexandrescu wrote and published "FlexString" back in 2001
https://devel.nuclex.org/external/svn/loki/trunk/native/include/loki/flex/



> it is
> common advice to use std::vector<char> instead of std::string or C-
> style char arrays. In the past I often seen this advice given out
> (it's even in More Effective STL) but without the utility functions to
> back it up I can see people ignoring this advice.
#include <algorithm> has most (if not all) )of the functions you are
looking for.
find_first_of, find_first_not_of, replace, replace_if, reverse, etc
are all there.
char phrase_raw[]="C++ is my favorite language";
std::vector<char> phrase_v(phrase_raw,phrase_raw+sizeof(phrase_raw));//
includes trailing 0
assert(std::equal(phrase_v.begin(),phrase_v.end(),phrase_raw));//
strcmp==0
assert(phrase_v.size()-1==strlen(phrase_raw));
assert(strcmp(&*phrase_v.begin()),phrase_raw)==0);
assert(std::distance(std::find(phrase_v.begin(),phrase_v.end(),'g'),phrase_v.begin())==strchr(phrase_raw,'g')-
phrase_raw)
char srchphrase[]="C++";
assert(std::distance(std::find_first_of(phrase_v.begin(),phrase_v.end() ,
srchphrase,srchphrase
+sizeof(srchphrase)-1),phrase_v.begin())==strcspn(phrase_raw,srchphrase)-
phrase_raw)

So it is indeed possible, but extremely tedious.
>
> FWIW, the app is reading in sections of a *huge* XML file. A buffer is
> used to hold a fragment which is then parsed using the Xerces SAX
> parser (thus it avoids creating a DOM object).
Apache SAX is not really a speed demon. There are a number of vendors
that have C++ /XML code generators that are far faster. Here is one
opoen source version http://www.codesynthesis.com/products/xsd/



Virtually all the use cases of "slow strings" are from Sun WorkShop
compiled MT. The Sun String implementation (purchased from RogueWave
and modified) used two heap allocations -- one for the guts and one
for the actual string data. This maximized binary compatibility when
upgrading (which was Sun's intent) but the implementation used one
global lock for both the heap allocs AND the string copies (this was a
COW implementation). This made somce sense in the pre multiprocessor
days, but was a disaster once SMP arrived. In the financial community,
this was further exacerbated since there was no small string
optimization (and financial data is swamped with little strings).
Compared to implementations like STLPort which did have SSO, it looks
slow beyond comprehension.

I would profile to make sure that indeed vector<char> is really faster
than std::string. I just yesterday advised a workmate to consider
replacing vector<unsigned> with basic_string<unsigned> for the reason
that strings already assume their types have trivial ctors/dtors, and
so are not going through the uninitialized fill, looping to call all
the dtor's that dont exists,etc. This gives the optimizer much less
code that it has to sift throough and judge unneccessary. In fact my
profile the other day showed that resizing a vector to the same size
repeatedly was a hotspot, simply because the vector implementation had
to go three levels deep to figure out that it didnt need to do
anything at all, and the optimizer could not inline all that.


Lance



--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Mathias Gaunard on
On 26 f�v, 22:29, Andrew <marlow.and...(a)googlemail.com> wrote:
> I am working with some legacy code that is in the process of changing
> to use std::vector<char> instead of a C-style char array. The C-style
> char array is currently allocated using new char [n]. This array is
> passed to various C string functions such as strstr, strncmp etc. I
> need to do the same work but with a std::vector.

You can keep using those functions as long as your vector is null-
terminated, since std::vector is contiguous.


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]