Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Joseph M. Newcomer on 25 Mar 2010 00:07

You are correct about the size_t. THere was some trash talk about std::vector not being
able to handle more than MAX_INT elements, but as I read std::vector, it uses the STL type
size_type, which is 64 bits in the 64-bit compiler. Even the most superficial study of
the documentation reveals in under 30 seconds that

typedef size_t size_type;

is the definition of size_type (look at allocator::size_type)

so you are right on target about the failure to use the correct data type for the size; it
should be size_type (which is size_t, which is 64 bits long in the 64-bit world!)
joe

On Wed, 24 Mar 2010 12:08:26 +0100, Oliver Regenfelder <oliver.regenfelder(a)gmx.at> wrote:

>Hello,
>
>Just some 64bit, C++ comments.
>
>Peter Olcott wrote:
>> #include <stdio.h>
>
>#include <cstdio>
>
>> #include <stdlib.h>
>
>#include <cstdlib>
>
>> #include <vector>
>> #include <time.h>
>
>#include <ctime>
>
>It is supposed to be C++ after all.
>
>> #define uint32 unsigned int
>
>typedef unsigned int uint32;
>
>> const uint32 repeat = 100;
>> const uint32 size = 524288000 / 4;
>> std::vector<uint32> Data;
>>
>>
>>
>> void Process() {
>> clock_t finish;
>> clock_t start = clock();
>> double duration;
>> uint32 num;
>> for (uint32 r = 0; r < repeat; r++)
>
>> for (uint32 i = 0; i < size; i++)
>> num = Data[num];
>
>This one is subtly bad. The proper type to index arrays is size_t.
>Especially as you will be running this on 64bit machines, and on
>windows64 int is still 32 bit as far as I remeber. Therefore make
>it a habit of indexing arrays with size_t. Because that array
>bigger than 4 GB will come and haunt you sooner or later.
>
>Also you are thinking about doing it on Windows, *nix, mac. In this
>case you can't rely on an int being able to index arrays of any
>possible size.
>
>> int main() {
>> printf("Size in bytes--->%d\n", size * 4);
>
>I would do:
> printf("Size in bytes--->%d\n", static_cast<size_t>(size) * 4);
>
>Otherwise you might be upset when size is above 1G.
>
>> Data.reserve(size);
>> for (int N = 0; N < size; N++)
>> Data.push_back(rand() % size);
>
>Again use size_t for indexing.
>
>Best regards,
>
>Oliver
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 25 Mar 2010 00:09

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:ohnlq5924q5tisi5vhkl46k8innq5vt7u0(a)4ax.com...
> See below...
> On Tue, 23 Mar 2010 09:16:34 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>
>>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>message
>>news:u8xPnamyKHA.2552(a)TK2MSFTNGP04.phx.gbl...
>>> Hector Santos wrote:
>>>
>>>> PS: I noticed the rand() % size is too short, rand() is
>>>> limited to RAND_MAX which is 32K. Change that to:
>>>>
>>>> (rand()*rand()) % size
>>>>
>>>> to get random range from 0 to size-1. I think thats
>>>> right, maybe Joe can give us a good random generator
>>>> here, but the above does seem to provide a practical
>>>> decent randomness for this task.
>>>
>>> Peter, using the above RNG seems to be a better test
>>> since
>>> it hits a wider spectrum. With the earlier one, it was
>>> only hitting ranges upto 32K.
>>>
>>> I also notice when the 32K RNG was used, a dynamic array
>>> was 1 to 6 faster than using std::vector. But when
>>> using
>>> the above RNG, they were both about the same. That is
>>> interesting.
>>>
>>> --
>>> HLS
>>
>>I made this adaptation and it slowed down by about 500%, a
>>much smaller cache hit ratio. It still scaled up to four
>>cores with 1.5 GB each, and four concurrent processes only
>>took about 50% more than a single process.
>>
> ****
> If you use multiple threads in a single process, or
> multiple processes with a shared
> memory segment implemented by a Memory Mapped File, you
> will reduce the footprint since
> either all threads (in one process) share the same memory
> footprint, or the multiple
> processes share largely the same memory footprint.

Yup. so if its fast enough with multiple processes it will
sure be fast enough as multiple threads, wouldn't it be?

> ****
>>I will probably engineer my new technology to be able to
>>handle multiple threads, if all that I have to do is
>>implement the heuristics that I mentioned. Since my first
>>server will only have a single core, on this server it
>>will
>>only have a single thread.
>>
>>I still think that the FIFO queue is a good idea. Now I
>>will
>>have multiple requests and on multi-core machines multiple
>>servers.
>>
>>What is your best suggestion for how I can implement the
>>FIFO queue?
>>(1) I want it to be very fast
>>(2) I want it to be portable across Unix / Linux /
>>Windows,
>>and maybe even Mac OS X
>>(3) I want it to be as robust and fault tolerant as
>>possible.
>>
>>It may simply provide 32-bit hexadecimal integer names of
>>input PNG files. These names roll over to the next
>>incremental number. Instead they may be HTTP connection
>>numbers. I have to learn about HTTP before I will know
>>what
>>I am doing here. Since all customers (including free
>>trial
>>customers) will have accounts with valid email address, I
>>can always email the results if the connection is lost.
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 25 Mar 2010 00:10

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:dlnlq5pok9nbsc35uaedbot0m18btno5ti(a)4ax.com...
>A multithreaded FIFO queue would make more sense; less
>chance of priority inversion
> effects. An I/O Completion Port makes a wonderful
> multithreaded FIFO queue with
> practically no effort!
> joe

How ?

>
> On Tue, 23 Mar 2010 14:28:56 -0400, Hector Santos
> <sant9442(a)nospam.gmail.com> wrote:
>
>>Peter Olcott wrote:
>>
>>> I still think that the FIFO queue is a good idea. Now I
>>> will
>>> have multiple requests and on multi-core machines
>>> multiple
>>> servers.
>>
>>
>>IMO, it just that its an odd approach to load balancing.
>>You are
>>integrating software components, like a web server with an
>>multi-thread ready listening server and you are hampering
>>it with a
>>single thread only FIFO queuing. It introduces other
>>design
>>considerations. Namely, you will need to consider a store
>>and forward
>>concept for your request and delayed responses. But if
>>your request
>>processing is very fast, maybe you don't need to worry
>>about it.
>>
>>In practice the "FIFO" would be at the socket level or
>>listening level
>>with concepts dealing with load balancing by restricting
>>and balancing
>>your connection with worker pools or simply letting it to
>>wait knowing
>>that processing won't be too long. Some servers have
>>guidelines for
>>waiting limits. For the WEB, I am not recall coming
>>across any
>>specific guideline other than a practical one per
>>implementation. The
>>point is you don't want the customers waiting too long -
>>but what is
>>"too long."
>>
>>> What is your best suggestion for how I can implement the
>>> FIFO queue?
>>> (1) I want it to be very fast
>>> (2) I want it to be portable across Unix / Linux /
>>> Windows,
>>> and maybe even Mac OS X
>>> (3) I want it to be as robust and fault tolerant as
>>> possible.
>>
>>
>>Any good collection class will do as long as you wrap it
>>with
>>synchronization. Example:
>>
>>
>>typedef struct _tagTSlaveData {
>> ... data per request.........
>>} TSlaveData;
>>
>>class CBucket : public std::list<TSlaveData>
>>{
>>public:
>> CBucket() { InitializeCriticalSection(&cs); }
>> ~CBucket() { DeleteCriticalSection(&cs); }
>>
>> void Add( const TSlaveData &o )
>> {
>> EnterCriticalSection(&cs);
>> insert(end(), o );
>> LeaveCriticalSection(&cs);
>> }
>>
>> BOOL Fetch(TSlaveData &o)
>> {
>> EnterCriticalSection(&cs);
>> BOOL res = !empty();
>> if (res) {
>> o = front();
>> pop_front();
>> }
>> LeaveCriticalSection(&cs);
>> return res;
>> }
>>private:
>> CRITICAL_SECTION cs;
>>} Bucket;
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 25 Mar 2010 00:14

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:snnlq5lfv1a59n003k5gkk8a6plb41ruj1(a)4ax.com...
> See below...
>
> On Tue, 23 Mar 2010 14:11:28 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>Ah so this is the code that you were suggesting?
>>I won't be able to implement multi-threading until volume
>>grows out of what a single core processor can accomplish.
>>I was simply going to use MySQL for the inter-process
>>communication, building and maintaining my FIFO queue.
> ****
> Well, I can think of worse ways. For example, writing the
> data to a floppy disk. Or
> punching it to paper tape and asking the user to re-insert
> the paper tape. MySQL for
> interprocess communication? Get serious!

Can you think of any other portable way that this can be
done?
I would estimate that MySQL would actually keep the FIFO
queue resident in RAM cache.

> ****
>>
>>One thing else that you may be unaware of std::vector
>>generally beats std::list even for list based algorithms,
>>including such things as inserting in the middle of the
>>list. The reason for this may be that the expensive memory
>>allocation cost is allocated over more elements with a
>>std::vector, more than enough to pay for the cost of
>>reshuffling a few items. This would probably not work for
>>very long lists. Also, maybe there is some sort of
>>std::list::reserve(), that would mitigate this cost.
> ****
> Actually, the vector/list tradeoff has been known since
> the late 1970s; look at the LISP
> machine papers from that era. Interesting that the old
> ideas keep getting rediscovered
> over and over again (doesn't anybody pay attention to
> history?)
> joe
> ****
>>
>>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>message
>>news:O5O%23XiryKHA.5936(a)TK2MSFTNGP04.phx.gbl...
>>> Example usage of the class below, I added an Add()
>>> override to make it easier to add elements for the
>>> specific TSlaveData fields:
>>>
>>> #include <windows.h>
>>> #include <conio.h>
>>> #include <list>
>>> #include <string>
>>> #include <iostream>
>>>
>>> using namespace std;
>>>
>>> const DWORD MAX_JOBS = 10;
>>>
>>> typedef struct _tagTSlaveData {
>>> DWORD jid; // job number
>>> char szUser[256];
>>> char szPwd[256];
>>> char szHost[256];
>>> } TSlaveData;
>>>
>>> class CBucket : public std::list<TSlaveData>
>>> {
>>> public:
>>> CBucket() { InitializeCriticalSection(&cs); }
>>> ~CBucket() { DeleteCriticalSection(&cs); }
>>>
>>> void Add( const TSlaveData &o )
>>> {
>>> EnterCriticalSection(&cs);
>>> insert(end(), o );
>>> LeaveCriticalSection(&cs);
>>> }
>>>
>>> void Add(const DWORD jid,
>>> const char *user,
>>> const char *pwd,
>>> const char *host)
>>> {
>>> TSlaveData sd = {0};
>>> sd.jid = jid;
>>> strncpy(sd.szUser,user,sizeof(sd.szUser));
>>> strncpy(sd.szPwd,pwd,sizeof(sd.szPwd));
>>> strncpy(sd.szHost,host,sizeof(sd.szHost));
>>> Add(sd);
>>> }
>>>
>>> BOOL Fetch(TSlaveData &o)
>>> {
>>> EnterCriticalSection(&cs);
>>> BOOL res = !empty();
>>> if (res) {
>>> o = front();
>>> pop_front();
>>> }
>>> LeaveCriticalSection(&cs);
>>> return res;
>>> }
>>> private:
>>> CRITICAL_SECTION cs;
>>> } Bucket;
>>>
>>>
>>> void FillBucket()
>>> {
>>> for (int i = 0; i < MAX_JOBS; i++)
>>> {
>>> Bucket.Add(i,"user","password", "host");
>>> }
>>> }
>>>
>>> //----------------------------------------------------------------
>>> // Main Thread
>>> //----------------------------------------------------------------
>>>
>>> int main(char argc, char *argv[])
>>> {
>>>
>>> FillBucket();
>>> printf("Bucket Size: %d\n",Bucket.size());
>>> TSlaveData o = {0};
>>> while (Bucket.Fetch(o)) {
>>> printf("%3d | %s\n",o.jid, o.szUser);
>>> }
>>> return 0;
>>> }
>>>
>>> Your mongoose, OCR thingie, mongoose will Bucket.Add()
>>> and
>>> each spawned OCR thread will do a Bucket.Fetch().
>>>
>>> Do it right, it and ROCKS!
>>>
>>> --
>>> HLS
>>>
>>> Hector Santos wrote:
>>>
>>>> Peter Olcott wrote:
>>>>
>>>>> I still think that the FIFO queue is a good idea. Now
>>>>> I
>>>>> will have multiple requests and on multi-core machines
>>>>> multiple servers.
>>>>
>>>>
>>>> IMO, it just that its an odd approach to load
>>>> balancing.
>>>> You are integrating software components, like a web
>>>> server with an multi-thread ready listening server and
>>>> you are hampering it with a single thread only FIFO
>>>> queuing. It introduces other design considerations.
>>>> Namely, you will need to consider a store and forward
>>>> concept for your request and delayed responses. But if
>>>> your request processing is very fast, maybe you don't
>>>> need to worry about it.
>>>>
>>>> In practice the "FIFO" would be at the socket level or
>>>> listening level with concepts dealing with load
>>>> balancing
>>>> by restricting and balancing your connection with
>>>> worker
>>>> pools or simply letting it to wait knowing that
>>>> processing won't be too long. Some servers have
>>>> guidelines for waiting limits. For the WEB, I am not
>>>> recall coming across any specific guideline other than
>>>> a
>>>> practical one per implementation. The point is you
>>>> don't
>>>> want the customers waiting too long - but what is "too
>>>> long."
>>>>
>>>>> What is your best suggestion for how I can implement
>>>>> the
>>>>> FIFO queue?
>>>>> (1) I want it to be very fast
>>>>> (2) I want it to be portable across Unix / Linux /
>>>>> Windows, and maybe even Mac OS X
>>>>> (3) I want it to be as robust and fault tolerant as
>>>>> possible.
>>>>
>>>>
>>>> Any good collection class will do as long as you wrap
>>>> it
>>>> with synchronization. Example:
>>>>
>>>>
>>>> typedef struct _tagTSlaveData {
>>>> ... data per request.........
>>>> } TSlaveData;
>>>>
>>>> class CBucket : public std::list<TSlaveData>
>>>> {
>>>> public:
>>>> CBucket() { InitializeCriticalSection(&cs); }
>>>> ~CBucket() { DeleteCriticalSection(&cs); }
>>>>
>>>> void Add( const TSlaveData &o )
>>>> {
>>>> EnterCriticalSection(&cs);
>>>> insert(end(), o );
>>>> LeaveCriticalSection(&cs);
>>>> }
>>>>
>>>> BOOL Fetch(TSlaveData &o)
>>>> {
>>>> EnterCriticalSection(&cs);
>>>> BOOL res = !empty();
>>>> if (res) {
>>>> o = front();
>>>> pop_front();
>>>> }
>>>> LeaveCriticalSection(&cs);
>>>> return res;
>>>> }
>>>> private:
>>>> CRITICAL_SECTION cs;
>>>> } Bucket;
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> HLS
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 25 Mar 2010 00:15

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:snnlq5lfv1a59n003k5gkk8a6plb41ruj1(a)4ax.com...
> See below...
>
> On Tue, 23 Mar 2010 14:11:28 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>Ah so this is the code that you were suggesting?
>>I won't be able to implement multi-threading until volume
>>grows out of what a single core processor can accomplish.
>>I was simply going to use MySQL for the inter-process
>>communication, building and maintaining my FIFO queue.
> ****
> Well, I can think of worse ways. For example, writing the
> data to a floppy disk. Or
> punching it to paper tape and asking the user to re-insert
> the paper tape. MySQL for
> interprocess communication? Get serious!

This was suggested on the comp.programming,threads group
Cross Platform FIFO Queue

> ****
>>
>>One thing else that you may be unaware of std::vector
>>generally beats std::list even for list based algorithms,
>>including such things as inserting in the middle of the
>>list. The reason for this may be that the expensive memory
>>allocation cost is allocated over more elements with a
>>std::vector, more than enough to pay for the cost of
>>reshuffling a few items. This would probably not work for
>>very long lists. Also, maybe there is some sort of
>>std::list::reserve(), that would mitigate this cost.
> ****
> Actually, the vector/list tradeoff has been known since
> the late 1970s; look at the LISP
> machine papers from that era. Interesting that the old
> ideas keep getting rediscovered
> over and over again (doesn't anybody pay attention to
> history?)
> joe
> ****
>>
>>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in
>>message
>>news:O5O%23XiryKHA.5936(a)TK2MSFTNGP04.phx.gbl...
>>> Example usage of the class below, I added an Add()
>>> override to make it easier to add elements for the
>>> specific TSlaveData fields:
>>>
>>> #include <windows.h>
>>> #include <conio.h>
>>> #include <list>
>>> #include <string>
>>> #include <iostream>
>>>
>>> using namespace std;
>>>
>>> const DWORD MAX_JOBS = 10;
>>>
>>> typedef struct _tagTSlaveData {
>>> DWORD jid; // job number
>>> char szUser[256];
>>> char szPwd[256];
>>> char szHost[256];
>>> } TSlaveData;
>>>
>>> class CBucket : public std::list<TSlaveData>
>>> {
>>> public:
>>> CBucket() { InitializeCriticalSection(&cs); }
>>> ~CBucket() { DeleteCriticalSection(&cs); }
>>>
>>> void Add( const TSlaveData &o )
>>> {
>>> EnterCriticalSection(&cs);
>>> insert(end(), o );
>>> LeaveCriticalSection(&cs);
>>> }
>>>
>>> void Add(const DWORD jid,
>>> const char *user,
>>> const char *pwd,
>>> const char *host)
>>> {
>>> TSlaveData sd = {0};
>>> sd.jid = jid;
>>> strncpy(sd.szUser,user,sizeof(sd.szUser));
>>> strncpy(sd.szPwd,pwd,sizeof(sd.szPwd));
>>> strncpy(sd.szHost,host,sizeof(sd.szHost));
>>> Add(sd);
>>> }
>>>
>>> BOOL Fetch(TSlaveData &o)
>>> {
>>> EnterCriticalSection(&cs);
>>> BOOL res = !empty();
>>> if (res) {
>>> o = front();
>>> pop_front();
>>> }
>>> LeaveCriticalSection(&cs);
>>> return res;
>>> }
>>> private:
>>> CRITICAL_SECTION cs;
>>> } Bucket;
>>>
>>>
>>> void FillBucket()
>>> {
>>> for (int i = 0; i < MAX_JOBS; i++)
>>> {
>>> Bucket.Add(i,"user","password", "host");
>>> }
>>> }
>>>
>>> //----------------------------------------------------------------
>>> // Main Thread
>>> //----------------------------------------------------------------
>>>
>>> int main(char argc, char *argv[])
>>> {
>>>
>>> FillBucket();
>>> printf("Bucket Size: %d\n",Bucket.size());
>>> TSlaveData o = {0};
>>> while (Bucket.Fetch(o)) {
>>> printf("%3d | %s\n",o.jid, o.szUser);
>>> }
>>> return 0;
>>> }
>>>
>>> Your mongoose, OCR thingie, mongoose will Bucket.Add()
>>> and
>>> each spawned OCR thread will do a Bucket.Fetch().
>>>
>>> Do it right, it and ROCKS!
>>>
>>> --
>>> HLS
>>>
>>> Hector Santos wrote:
>>>
>>>> Peter Olcott wrote:
>>>>
>>>>> I still think that the FIFO queue is a good idea. Now
>>>>> I
>>>>> will have multiple requests and on multi-core machines
>>>>> multiple servers.
>>>>
>>>>
>>>> IMO, it just that its an odd approach to load
>>>> balancing.
>>>> You are integrating software components, like a web
>>>> server with an multi-thread ready listening server and
>>>> you are hampering it with a single thread only FIFO
>>>> queuing. It introduces other design considerations.
>>>> Namely, you will need to consider a store and forward
>>>> concept for your request and delayed responses. But if
>>>> your request processing is very fast, maybe you don't
>>>> need to worry about it.
>>>>
>>>> In practice the "FIFO" would be at the socket level or
>>>> listening level with concepts dealing with load
>>>> balancing
>>>> by restricting and balancing your connection with
>>>> worker
>>>> pools or simply letting it to wait knowing that
>>>> processing won't be too long. Some servers have
>>>> guidelines for waiting limits. For the WEB, I am not
>>>> recall coming across any specific guideline other than
>>>> a
>>>> practical one per implementation. The point is you
>>>> don't
>>>> want the customers waiting too long - but what is "too
>>>> long."
>>>>
>>>>> What is your best suggestion for how I can implement
>>>>> the
>>>>> FIFO queue?
>>>>> (1) I want it to be very fast
>>>>> (2) I want it to be portable across Unix / Linux /
>>>>> Windows, and maybe even Mac OS X
>>>>> (3) I want it to be as robust and fault tolerant as
>>>>> possible.
>>>>
>>>>
>>>> Any good collection class will do as long as you wrap
>>>> it
>>>> with synchronization. Example:
>>>>
>>>>
>>>> typedef struct _tagTSlaveData {
>>>> ... data per request.........
>>>> } TSlaveData;
>>>>
>>>> class CBucket : public std::list<TSlaveData>
>>>> {
>>>> public:
>>>> CBucket() { InitializeCriticalSection(&cs); }
>>>> ~CBucket() { DeleteCriticalSection(&cs); }
>>>>
>>>> void Add( const TSlaveData &o )
>>>> {
>>>> EnterCriticalSection(&cs);
>>>> insert(end(), o );
>>>> LeaveCriticalSection(&cs);
>>>> }
>>>>
>>>> BOOL Fetch(TSlaveData &o)
>>>> {
>>>> EnterCriticalSection(&cs);
>>>> BOOL res = !empty();
>>>> if (res) {
>>>> o = front();
>>>> pop_front();
>>>> }
>>>> LeaveCriticalSection(&cs);
>>>> return res;
>>>> }
>>>> private:
>>>> CRITICAL_SECTION cs;
>>>> } Bucket;
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> HLS
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system