From: bernard on
Dear experts,

I would like to create a custom string class to be used as the most
efficient key possible in a map .
Hash or tree, I'll have to bench to decide, so in fact I may have to
create two custom string classes :)
1�) one tailored for fast compare (less)
2�) one tailored for fast hash and equality.

However both will be based on the same principle of trading space for
speed following two ideas :

1�) small string optimization : an array of (local_size) data will be
stored directly in the object, resorting to dynamic allocation only if
needed
2�) storing 8 bits chars (ASCII) in wider types to enable vector
operations (As you can guess, I also intend to venture out of C++
standard
into stronger alignment specifications to use vector builtin
functions , but a first implementation could rely on the compiler
generated vectorization of standard C++)

Hence the subject :
template<unsigned char local_size, typename wide_type>
fast_string{...};

While I can't believe that I would be the first to implement such a
class, I could not find an example on the net, nor by searching this
newsgroup.
Would you have some advices/examples wrt to implementation
strat�gies ?
(0 termination or storing length, union of wide_type
local_data[local_size] and wide_type* far_data or always storing the
first chars in local data, etc.)

I'd also request advices on how best to handle the over stringent (wrt
standard C++) requirements of vector instructions if this could be
deemed not too off-topic for the list.

Best regards,

Bernard


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

From: Ulrich Eckhardt on
bernard wrote:
> I would like to create a custom string class to be used as the most
> efficient key possible in a map .
> Hash or tree, I'll have to bench to decide, so in fact I may have to
> create two custom string classes :)
> 1°) one tailored for fast compare (less)
> 2°) one tailored for fast hash and equality.
>
> However both will be based on the same principle of trading space for
> speed following two ideas :
>
> 1°) small string optimization : an array of (local_size) data will be
> stored directly in the object, resorting to dynamic allocation only if
> needed

Some implementations of std::string and std::wstring already do that, like
e.g. STLport.

> 2°) storing 8 bits chars (ASCII) in wider types to enable vector
> operations (As you can guess, I also intend to venture out of C++
> standard into stronger alignment specifications to use vector
> builtin functions , but a first implementation could rely on the
> compiler generated vectorization of standard C++)

ASCII is even just 7 bits. Anyhow, if you guarantee alignment suitable for
e.g. 32 bits and also that the allocated size is always terminated by a
zero 32 bit word, simple comparisons for equality can operate on such
chunks of four 8 bit chars. Hashing can probably benefit from this
approach, too.

> Would you have some advices/examples wrt to implementation
> stratégies ?
> (0 termination or storing length,

Both probably, you will not want to modify the string on calls to c_str(),
provided you even want to supply a way to get at a zero-terminated version.
Otherwise, you could simply sacrifice this legacy in favour of speed.
However, I wouldn't store the length but rather a past-the-end pointer.

> union of wide_type local_data[local_size] and wide_type* far_data
> or always storing the first chars in local data, etc.)

I'd always store beginning and end pointers and always use a contiguous
storage area, i.e. not parts local and parts detached.

> I'd also request advices on how best to handle the over stringent (wrt
> standard C++) requirements of vector instructions if this could be
> deemed not too off-topic for the list.

I'd look for example code in computationally heavy math libraries. Also, how
to best lay out the data in memory probably depends on which vector
instructions you're trying to support. Anyhow, that is the last step
probably, the first one is creating a set of benchmarks.

Good luck!

Uli

--
Sator Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932


[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]