Can extra processing threads help in this case? [MFC]

Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system

From: Joseph M. Newcomer on 6 Apr 2010 14:04

See below...
On Mon, 5 Apr 2010 21:32:44 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>message news:e41lr59chafrfs27uakv7b8ob1iv9dqq2i(a)4ax.com...
>> See below...
>> On Mon, 5 Apr 2010 15:35:28 -0500, "Peter Olcott"
>> <NoSpam(a)OCR4Screen.com> wrote:
>>
>>>
>>>"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
>>>message news:ro9kr5lk8kad3anflhhcj0iecrvosf381n(a)4ax.com...
>>>> See below...
>>>> On Sat, 3 Apr 2010 18:27:00 -0500, "Peter Olcott"
>>>> <NoSpam(a)OCR4Screen.com> wrote:
>>>>
>>>
>>>>>I like to fully understand the underlying infrastructure
>>>>>before I am fully confident of a design. For example, I
>>>>>now
>>>>>know the underlying details of exactly how SQLite can
>>>>>fully
>>>>>recover from a power loss. Pretty simple stuff really.
>>>> *****
>>>> Ask if it is a fully-transacted database and wht recover
>>>> techniques are implmemented in
>>>> it. Talk to a a MySQL expert. Look into what a
>>>> rollback
>>>> of a transaction means. These
>>>> are specified for most database (my experience in
>>>> looking
>>>> at these predates MySQL, so I
>>>> don't know what it does; I haven't looked at this
>>>> technology since 1985 or 1986)
>>>>
>>>> That's all the understaning you need. Intellectual
>>>> cuiriosity my suggest that you
>>>> understand how they implement this, but such
>>>> understanding
>>>> is not critical to the decision
>>>> process.
>>>
>>>No. I need a much deeper understanding to approximate an
>>>optimal mix of varied technologies. A transacted database
>>>only solves one aspect of one problem, it does not even
>>>solve every aspect of even this one problem.
>> ****
>> No, it does not handle the case where the disk melts down,
>> or the entire computer room
>> catches fire and every machine is destroyed either by heat
>> or by water damage.
>>
>
>Ah but, then you are ignoring the proposed aspect of my
>design that would handle all those things. What did you call
>it "mirrored transactions". I called it on-the-fly
>transaction-by-transaction offsite backup.
****
If you have a "proposed aspect" I presume you have examined the budget numbers for actual
dollars required to achieve this, and the complexity of making sure it works right.

I am not ignoring the issue, I'm asking if you have ignored the realities involved in
achieving it!

When a major New York investment firm mirrors their transactions with dedicated
fiber-optic links to two computer sites 50 miles away, they have looked at whether or not
this is cost-effective, hired a team of people to make it work, and given the flow of
billions of dollars a day through their computers, written this effort off as pocket
change, it isn't even a blip in their cost statment (which is rounded off to the nearest
million dollars). You are in a different situation.

Mirrored transactional file systems have real costs: in dollars, in time-to-install, in
complexity, in performance. If you have not evaluated these costs in detail, you may be
surprised at what they are. Proposing a solution is not the same as realizing a
cost-effective solution.
joe
****
>
>> How high an exponent do you think you have to support in
>> the 1 in 10**n probabilities?
>>
>> The simple fallback is: (a) don't charge for work not
>> delivered (b) in the case of any
>> failure, require the transaction be resubmitted [and see
>> (a)].
>
>Yes I like that Idea.
>
>>
>> If you need offsite storage for file backup, this may mean
>> that in the case of a disaster,
>> you lose all the income from the last backup to the time
>> of the disaster, and that tells
>> you how often you need to do offsite backups. If you lose
>> $50, this may be acceptable; if
>> you lose $500, this probably isn't.
>> joe
>
>I don't want to ever lose any data pertaining to customers
>adding money to their account. I don't want to have to rely
>on the payment processor keeping track of this. Maybe there
>are already mechanisms in place that can be completely
>relied upon for this.
****
If a customer can add $1 and you spend $5 making sure they don't lose it, have you won?

Or, you can build a system in which the customers don't lose money, but you will, but you
do the risk analysis and discover that by expending $20 you cannot lose more than $100,
and that is a low-probability event, that is probably worth it. My deductible is $500, so
if I have to expend massive expense to avoid losses under $500 I probably won't do it.
OTOH, I maintain some expensive service policies on my equipment, and the last service
done under the policy would have cost me > $2000 had I not had it covered. Now that was
worth it. Risk/benefit analysis is important.
joe
****
>
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Joseph M. Newcomer on 6 Apr 2010 15:26

See below...
On Mon, 5 Apr 2010 21:11:05 -0500, "Peter Olcott" <NoSpam(a)OCR4Screen.com> wrote:

>
>"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
>news:eVZ1n$R1KHA.5212(a)TK2MSFTNGP05.phx.gbl...
>> Hector Santos wrote:
>>
>>>>> If you believe a single thread guarantees transactional
>>>>> integrity, you are wrong.
>>>>> joe
>>>>
>>>> It does gurantee that:
>>>> (1) Read balance
>>>> (2) Deduct 10 cents from balance
>>>> (3) Write updated balance
>>>> don't have any intervening updates between steps.
>>>
>>> He's right Joe. For a Single Thread Access only SQLITE
>>> implementation, he can guarantee no write contention
>>> issues. Perfect for any application that doesn't have
>>> more than one thread. <grin>
>>>
>>> But he doesn't understand is that when step 3 is running,
>>> step 1 is locked for any other thread. But he doesn't
>>> need that, its a FIFO based accessor :)
>>
>>
>> You know Joe, With his low volume, he really doesn't need
>> any SQL engine at all!
>>
>
>I need the SQL engine to keep track of user accounts,
>including free user accounts. Every user must have an
>account. Free users will be restricted to Time New Roman 10
>point, and have the priority of these jobs only when no
>other jobs are pending.
****
Gven Hector's previous description, where you only need a simple texf file, I don't see
why you think you need a SQL engine to "keep track of user accounts". A simple text file,
whose name is the user account, will work just fine.

I see you just added more requirements to the mix: multiple queues with one queue being a
low-priority queue. The Magic Morphing Requirements strike again!
****
>
>> He can easily just write a single text FILE per request
>> and per USER account file system.
>
>Yes, that is the sort of thing that I have envisioned. The
>output text will be UTF-8.
>
>>
>> That will help with his target 100ms single transaction at
>> a time FIFO design need and he will have almost 100% crash
>> restart integrity!
>>
>> I would consider using an old school simple X12-like EDI
>> format for its transaction codes and user data fields and
>> he might be able to sell this idea for his B2B web service
>> considerations with traditional companies familiar and use
>> EDI!
>>
>> And whats good about using POTF (plain old text files), he
>> can leverage existing tools in all OSes:
>>
>> - He can edit the text files with NOTEPAD or vi.
>> - He can delete accounts with DEL * or rm *
>> - He can back it up using zip or cab or gz!
>> - He can search using dir or ls!
>>
>> Completely PORTABLE! SIMPLE! CHEAP! FAST! FAULT TOLERANCE!
>> NETWORK SHARABLE! ATOMIC FOR APPENDS! EXCLUSIVE, READ,
>> WRITE FILE LOCKING!
>>
>
>Yes that is the sort of system that I have been envisioning.
>I still have to have SQL to map the email address login ID
>to customer number.
****
No, it should be obvious that you do NOT need a SQL implementation to do this! In fact,
Hector's suggestion of a text file works perfectly! The "mapping" is handled by a very
sophisticated piece of technology called a "file directory".
****
>
>I have been envisioning the primary means of IPC, as a
>single binary file with fixed length records. I have also
>envisioned how to easily split this binary file so that it
>does not grow too large. For example automatically split it
>every day, and archive the older portion.
****
Why? Why not a directory full of files? I've done this; each file had a timestamp as its
name, and since the files were kept in alphabetical order, a simple wildcard search of *.*
revealed them to me in alphabetical order, which was also temporal order. I'm not sure
FindFirstFile/FindNextFile guarantees alphabetical order, but in the system I used, the
directory scanner did. And I handled the case of two submissions at the same time by (a)
opening the file with an exclusive file lock, so an atempt to create a file of the same
name failed and (b) always created the file with the "fail if file already exists" flag
set so if the file really did already exist, I'd get an error. At which point, I'd
generate a new file name (new timestamp) and that solved the problem.

See what I mean about getting too low-level too quickly? You assumed that you NEEDED a
SQL engine to do something utterly trivial, that anyone else would do via any of a number
of alternative ways (such as: keep the list of users in memory at all times, look up in
memory using std::map, for example, and write to disk only when there is a change, such as
an account is added or deleted).

Your assumption that you need a file system to implement the FIFO is equally ill-founded;
if you simply say "any transaction in flight is lost on a crash" then the problem gets
much simpler. Since crashes are nominally extremely rare events (your own software is
more likely to be a cause of failure than the incoming power supply!) then the recovery is
equally trivial: the client re-submits the requrest!

You are over-engineering solutions to solve virtually non-existent problems in
overly-complex ways. Step back, study your requirements, and move forward again. Throw
out any implementation mechanism not required to meet the revised requirements.
joe

>
>> <grin>
>>
>> --
>> HLS
>
Joseph M. Newcomer [MVP]
email: newcomer(a)flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm

From: Peter Olcott on 6 Apr 2010 17:51

"Hector Santos" <sant9442(a)nospam.gmail.com> wrote in message
news:%23QIKZaa1KHA.220(a)TK2MSFTNGP06.phx.gbl...
> Peter Olcott wrote:
>
> > I would envision only using anything as heavy weight as
> > SQLite for just the financial aspect of the transaction.
>
> SQLITE is not "heavy weight," its lite weight and only
> good for a single accessor applications. Very popular for
> applications in configurations or user recorsd, but only
> THEY have access and no one else.
>
> You can do handle multiple access but at the expense of
> speed. The SQLITE people makes no bones about that.
> SQLITE works because the target market don't have any sort
> of critical speed requirement and can afford the latency
> in DATAFILE sharing.
>
> SQLITE uses what is called a Reader/Writer Lock technique
> very common in synchronization of a common resource among
> threads

Compared to a simple file even SQLite is too heavy for the
transaction log because SQL has no concept of record number
that maps to a file offset. This means that one has to have
an index just to keep the records in append order. Also even
if you have the closest thing that SQL has to a record
number, you can't use this as a file byte offset for a seek.

>
> You can have many readers, but one writer.
> If readers are active, the writer must wait until no
> more readers
> if writers are active, the reader must wait until no
> more writers
>
> If you use OOPS with a class based ReaderWriter Class,, it
> makes the programming easier:
>
> Get
> {
> CReader LOCK()
> get record
> }
>
>
> Put
> {
> CWriter LOCK()
> put record
> }
>
> The nice thing is that when you lose local scope, the
> destructor of the reader/writer lock will
> release/decrement the lock references.
>
> Now in Windows, thread synchronization is generally done
> use whats called Kernel Objects. They are SEMAPHORES, a
> MUTEX is a special type of semaphore.
>
> For unix, I am very rusty here, but it MIGHT still use the
> old school method which was also used in DOS using what I
> called "File Semaphores." In other words, a FILE is used
> to signify a LOCK.
>
> So one process will create a temporary file:
>
> process-id.LCK
>
> and other other processes will wait on that file
> disappearing and only the OWNER (creator of the lock) can
> release/delete it.
>
> As I understood it, pthreads was an augmented technology
> and library to allow unix based applications to begin
> using threads. I can't tell you the details but as I
> always understood it they all - WINDOWS and UNIX - are
> conceptually the same when it comes to common resource
> sharing models. In other words, you look for the same type
> of things in both.
>
> > The queue of HTTP requests would use a lighter weight
> > simple
> > file.
>
> For you, you can use a single log file or individual *.REQ
> files which might be better/easier using a File
> Notification event concept. Can't tell you abou *nix, but
> for Windows:
>
> FindFirstChangeNotification()
> ReadDirectoryChangeW()
>
> The former might be available under *nix since its the
> older idea. The latter was introduced for NT 3.51 so its
> available for all NT based OSes. It is usually used with
> IOCP designs for scalability and performance.
>
> In fact, one can use ReadDirectoryChangeW() along with
> Interlocked Singly Linked Lists:
>
>
> http://msdn.microsoft.com/en-us/library/ms684121(v=VS.85).aspx
>
> to give you a highly optimized, high performance atomic
> FIFO concept. However, there is a note I see for 64bit
> operations.
>
> > I would use some sort of IPC to inform the OCR that a
> > request is available to eliminate the need for a polled
> > interface. The OCR process would retrieve its jobs form
> > this
> > simple file.
>
> See above.
>
>
> > According the Unix/Linux docs multiple threads could
> > append
> > to this file without causing corruption.
>
> So does windows. However, there could be a dependency on
> the storage device and file drivers.
>
> In general, as long as you open for append, write and
> close, and do leave it open, don't use any files stat
> readings or seeking on your own, it works very nicely:

I need to have the file opened for append by one process,
opened for read/write for another process, can't I just keep
it open?
If I do close it like you suggest, will it being opened by
one process prevent it from being opened by another?

It seems like one process could append and another one
read/write without interfering with each other.

>
> FILE *fv = fopen("request.log","at");
> if (fv) {
> fprint(fv,"%s\n",whatever);
> fclose(fv);
> }
>
> However, if you really wanted a guarantee, then you can
> user a critical section, a named kernel object (named so
> it can be shared among processes), or use sharing mode
> open file functions with a READ ONLY sharing attribute.
> Using CreateFile(), it would look like this:

It would be simpler to bypass the need of this and simply
delegate writing the transaction log file to a single
thread.
Also if the OS already guarantees that append is atomic why
slow things down uncessarility?

>
> BOOL AppendRequest(const TYourData &data)
> {
> HANDLE h = INVALID_HANDLE_VALUE;
> DWORD maxTime = GetTickCount()+ 20*1000; // 20 seconds
> max wait
> while (1)
> {
> h = CreateFile("request.log",
> GENERIC_WRITE,
> FILE_SHARE_READ,
> NULL,
> OPEN_ALWAYS,
> FILE_ATTRIBUTE_NORMAL,
> NULL);
> if (h != INVALID_HANDLE_VALUE) break; // We got a good
> handle
> int err = GetLastError();
> if (err != 5 && err != 32) {
> return FALSE;
> }
> if (GetTickCount() > maxTime) {
> SetLastError(err); // make sure error is preserved
> return FALSE;
> }
> _cprintf("- waiting: %d\n",GetTickCount()-maxTime);
> Sleep(50);
> }
> SetFilePointer(h,0,NULL,FILE_END);
>
> DWORD dw = 0;
> if (!WriteFile(h,(void *)&data,sizeof(data),&dw,NULL)) {
> // something unexpected happen
> CloseHandle(h);
> return FALSE;
> }
>
> CloseHandle(h);
> return TRUE;
> }
>
>
> > If this is not the
> > case then a single thread could be invoked through some
> > sort
> > of FIFO, such as in Unix/Linux is implemented as a named
> > pipe, with each of the web server threads writing to the
> > FIFO.
>
> If that is all *nix has to offer, historically, using
> named pipes can be unreliable, especially under multiple
> threads.

There are several different types of IPC, I chose the named
pipe because it is inherently a FIFO queue.

>
> But since you continue to mix up your engineering designs
> and you need to get that straight, process vs threads, the
> decision will decide what to use.

The web server will be a process with one thread per HTTP
request. The OCR will be a process with at least one thread.
I may have multiple threads for differing priorities and
have the higher priority thread preempt the lower ones, such
that only one thread is running at a time.

>
> Lets say you listen and ultimately design a multi-thread
> ready EXE and you want to also allow multiple EXE to run,
> either on the same machine or another machine and want to
> keep this dumb FIFO design for your OCR, then by
> definition you need a FILE BASED sharing system.

The purpose of the named pipe is to report the event that
the transaction log has a new transaction available for
processing. I am also envisioning that another named pipe
will report the event that processing is completed on one
HTTP request.

>
> While there are methods to do cross machine MESSAGING,
> like named pipes, it is still fundamentally based on a
> file concept behind the scenes, they are just "special
> files".

The processes are on the same machine. Apparently this
"file" is not a "disk" file, everything occurs in memory.

>
> You need to trust my 30 years of designing server with
> HUGE IPC requirements. You can write your OWN "messaging
> queue" with ideas based on the above AppendRequest(), just
> change the file name to some shared resource location:
>
> \\SERVER_MACHINE\SharedFolder\request.log
>
> and you got your Intra and Inter Process communications,
> Local, Remote, Multi-threads, etc.!
>
> Of course, using an shared SQL database with tables like
> above to do the same thing.

More overhead.

>
> Your goal as a good "Software Engineer" is to outline the
> functional requirements and also use BLACK BOX
> interfacing. You could just outline this using an
> abstract OOPS class:
>
> class CRequestHandlerAbstract {
> public:
> virtual bool Append(const TYourData &yd) = 0;
> virtual bool GetNext(TYourData &yd) = 0;
> virtual bool SetFileName(const char *sz) { return sfn
> = sz; }
>
> struct TYourData {
> ..fields...
> };
> protected:
> virtual bool OpenFile() = 0;
> virtual bool CloseFile() = 0;
> string sfn;
> };
>
> and that is all you basically need to know. The
> implementation of this abstract class will be for the
> specific method and OS you will be using. What doesn't
> change is your Web server and OCR. It will use the
> abstract methods as the interface points.

At this early stage of my learning process I also need to
get physical so that I better understand what kinds of
things are feasible, and the degree of difficulty in
implementing the various approaches.

>
>> Yes that is the sort of system that I have been
>> envisioning. I still have to have SQL to map the email
>> address login ID to customer number.
>
>
> That will depends on how you wish to define your customer
> number. Its a purely numeric and serial, i.e, start at 1,
> then you can define in your SQL database table schema, an
> auto-increment id field which the SQL engine will
> auto-increment for you when you first create the user
> account with the INSERT command.

Yes, that is the idea.

>
> Example, a table "CUSTOMERS" in the database is create:
>
> CREATE TABLE customers (
> id int auto_increment,
> Name text,
> Email Text,
> Password text
> )
>
> When you create the account, the insert will look like
> this:
>
> INSERT INTO customers values
> (NULL,'Peter','pete(a)abc.com','some_hash_value')
>
> By using the NULL for the first ID field, SQL will
> automatically use the next ID number.
>
> In general, a typical SQL tables layout uses auto-increase
> ID fields as the primary or secondary key for each table,
> that allows you to not duplicate data. So you can have an
> SESSIONS table for currently logged in users:
>
> CREATE TABLE sessions (
> id int auto_increment, <<--- view it as your
> transaction session id
> cid int,
> StartTime DateTime,
> EndTime DataTime,
> ..
> ..
> )
>
> where the link is Customers.id = Sessions.cid.
>
> WARNING:
>
> One thing to remember is that DBA (Database Admins) value
> their work and are highly paid. Do not argue or dispute
> with them as you

I did non SQL database programming for a decade.

> normally do. Most certainly will not have the patience
> shown here to you. SQL setups is a HIGHLY complex subject
> and it can be easy if you keep it simple. Don't get LOST
> with optimization until the need arises, but using common
> sense table designs should be non-brainer upfront. Also,
> while there is a standard in the "SQL language" there are
> differences between SQL engines, like the above CREATE
> statements, they are generally slightly different for
> different SQL engines. So I advise you to use common SQL
> data types and avoid special definitions unless you made
> the final decision to stick with one vendor SQL engine.
>
> You are a standard design, all you will need at a minimum
> for tables are:
>
> customers customer table
> auto-increment primary key: cid
>
> products customer products limits, etc, table
> auto-increment primary key: pid
> secondary key: cid
>
> This would be a one to many table.
>
> customers.cid <---o products.cid
>
> select * from customers, products
> where customers.cid =
> products.cid
>
> You can use a JOIN here too which a
> DBA will
> tell you to do, but the above is the
> BASIC
> concept.
>
> sessions sessions management table
> can server as session history log as
> well
>
> auto-increment primary key: sid
> secondary key: cid
>
> requests Your "FIFO"
> auto-increment primary key: rid
> secondary key: cid
> secondary key: sid
>
> Some DBAs might suggest combining tables, Using or not
> using indices or secondary keys, etc. There are is no
> real answer and it highly depends on the SQL when it comes
> to optimization. So DON'T key lost with it. You can
> ALWAYS create indices if need be.
>

I already know about third normal form, and canonical
synthesis. Its all probably moot on this simple little
database. The only index value will be user email address.

>> I have been envisioning the primary means of IPC, as a
>> single binary file with fixed length records. I have also
>> envisioned how to easily split this binary file so that
>> it does not grow too large. For example automatically
>> split it every day, and archive the older portion.
>
>
> Well, to do that you have no choice but to implement your
> own file sharing class as shown above. The concept is
> basically a Log Rotater.
> You can now update the CRequestHandlerAbstract class with
> one more method requirement:

I am not sure if that is true. One process appends to the
file. Another process uses pread() and pwrite() to read and
write to the file. These are supposed to be guaranteed to be
atomic, which I am taking to mean that the OS forces them to
occur sequentially.

>
> class CRequestHandlerAbstract {
> public:
> virtual bool Append(const TYourData &yd) = 0;
> virtual bool GetNext(TYourData &yd) = 0;
> virtual bool SetFileName(const char *sz) { return sfn
> = sz; }
>
> virtual bool RotateLog() = 0; // << NEW REQUIREMENT
>
> struct TYourData {
> ..fields...
> };
> protected:
> virtual bool OpenFile() = 0;
> virtual bool CloseFile() = 0;
> string sfn;
> };
>
> But you also achieve rotation if you use a special file
> naming nomenclature, this is called Log Periods. It could
> be based on today's date.
>
> "request-{yyyymmdd}.log"
>
> That will guarantee a daily log, or do it other periods:
>
> "request-{yyyy-mm}.log" monthly
> "request-{yyyy-ww}.log" week number
> "request-{yyyy-mm}.log" monthly
> "request-{yyyymmddhh}.log" hourly
>
> and so on, and you also couple it by size.
>
> This can be handle by adding a LogPeriod, FileNameFormat,
> MaxSize variables which the OpenFile() can use;
>
> class CRequestHandlerAbstract {
> public:
> virtual bool Append(const TYourData &yd) = 0;
> virtual bool GetNext(TYourData &yd) = 0;
> virtual bool SetFileName(const char *sz) { return sfn
> = sz; }
>
> virtual bool RotateLog() = 0; // << NEW REQUIREMENT
>
> struct TYourData {
> ..fields...
> };
> protected:
> virtual bool OpenFile() = 0;
> virtual bool CloseFile() = 0;
> ctring sfn;
>
> public:
> int LogPeriod; // none, hourly, daily, weekly,
> monthly...
> int MaxLogSize;
> Ctring FileNameFormat;
> };
>
> and by using a template idea for the file name you can use
> string replacements very easily.
>
> GetSystemTime(&st)
>
> CString logfn = FileNameFormat;
> if (logfn.Has("yyyy"})
> logfn.Replace("yyyy",Int2Str(st.wYear));
> if (logfn.Has("mm"})
> logfn.Replace("mm",Int2Str(st.wMonth));
> ... etc ...
>
> if (MaxLogSize > 0) {
> DWORD fs = GetFileSizeByName(logfn,NULL);
> if (fs != -1 && fs >= MaxLogSize) {
> // Rename file with unique serial number
> // "request-yyyymm-1.log"
> // "request-yyyymm-2.log"
> // etc.
> // finding highest #.
>
> RenameFileWithASerialNumberAppended(logfn)
> }
> }
>
> etc.
>
> --
> HLS

From: Peter Olcott on 6 Apr 2010 17:59

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:2atmr51ml9kn4bb5l5j77h3lpiqtnlq8m3(a)4ax.com...
> See below...
> On Mon, 5 Apr 2010 21:32:44 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:
>
>>Ah but, then you are ignoring the proposed aspect of my
>>design that would handle all those things. What did you
>>call
>>it "mirrored transactions". I called it on-the-fly
>>transaction-by-transaction offsite backup.
> ****
> If you have a "proposed aspect" I presume you have
> examined the budget numbers for actual
> dollars required to achieve this, and the complexity of
> making sure it works right.
>
> I am not ignoring the issue, I'm asking if you have
> ignored the realities involved in
> achieving it!

I would simply re-implement some of the aspects of my web
application such that there is another web application on
another server that the first server can send its
transactions to.

>>I don't want to ever lose any data pertaining to customers
>>adding money to their account. I don't want to have to
>>rely
>>on the payment processor keeping track of this. Maybe
>>there
>>are already mechanisms in place that can be completely
>>relied upon for this.
> ****
> If a customer can add $1 and you spend $5 making sure they
> don't lose it, have you won?

If you don't make sure that you don't lose the customer's
money your reputation will put your out of business. If you
can't afford to make sure that you won't lose the customer's
money then you can't afford to go into business.

From: Peter Olcott on 6 Apr 2010 18:23

"Joseph M. Newcomer" <newcomer(a)flounder.com> wrote in
message news:032nr511e3rcqgvp2niteit9smtuf75n92(a)4ax.com...
> See below...
> On Mon, 5 Apr 2010 21:11:05 -0500, "Peter Olcott"
> <NoSpam(a)OCR4Screen.com> wrote:

>>I need the SQL engine to keep track of user accounts,
>>including free user accounts. Every user must have an
>>account. Free users will be restricted to Time New Roman
>>10
>>point, and have the priority of these jobs only when no
>>other jobs are pending.
> ****
> Gven Hector's previous description, where you only need a
> simple texf file, I don't see
> why you think you need a SQL engine to "keep track of user
> accounts". A simple text file,
> whose name is the user account, will work just fine.

This can get too messy too quickly given enough volume of
customers and transactions. Since try before you buy free
service will be a part of the mix, a large volume should be
planned for.

>
> I see you just added more requirements to the mix:
> multiple queues with one queue being a
> low-priority queue. The Magic Morphing Requirements
> strike again!
> ****

Explore the boundaries of possible solutions before
implementing any one of them.

I think that I could implement at least two different levels
of priority where the high level priority preempts the lower
ones, quickly saving their state, in about a week, once
everything else is done. There may be four different levels
of priority. I would not implement this initially.

>>Yes that is the sort of system that I have been
>>envisioning.
>>I still have to have SQL to map the email address login ID
>>to customer number.
> ****
> No, it should be obvious that you do NOT need a SQL
> implementation to do this! In fact,
> Hector's suggestion of a text file works perfectly! The
> "mapping" is handled by a very
> sophisticated piece of technology called a "file
> directory".

I don't want to mess with hundreds of thousands of little
files it would be an administrative nightmare.

> ****
>>
>>I have been envisioning the primary means of IPC, as a
>>single binary file with fixed length records. I have also
>>envisioned how to easily split this binary file so that it
>>does not grow too large. For example automatically split
>>it
>>every day, and archive the older portion.
> ****
> Why? Why not a directory full of files? I've done this;
> each file had a timestamp as its
> name, and since the files were kept in alphabetical order,
> a simple wildcard search of *.*
> revealed them to me in alphabetical order, which was also
> temporal order. I'm not sure
> FindFirstFile/FindNextFile guarantees alphabetical order,
> but in the system I used, the

Far too slow comapred with ISAM seek, also too messy.

> directory scanner did. And I handled the case of two
> submissions at the same time by (a)
> opening the file with an exclusive file lock, so an atempt
> to create a file of the same
> name failed and (b) always created the file with the "fail
> if file already exists" flag
> set so if the file really did already exist, I'd get an
> error. At which point, I'd
> generate a new file name (new timestamp) and that solved
> the problem.
>
> See what I mean about getting too low-level too quickly?
> You assumed that you NEEDED a

I go back and forth between levels of abstraction, and
concrete detail.

> SQL engine to do something utterly trivial, that anyone
> else would do via any of a number
> of alternative ways (such as: keep the list of users in
> memory at all times, look up in
> memory using std::map, for example, and write to disk only
> when there is a change, such as
> an account is added or deleted).

Its not quite so trivial when one adds the capability for
the customer to log in and get all of their recent
transactions and the associated output data.

>
> Your assumption that you need a file system to implement
> the FIFO is equally ill-founded;
> if you simply say "any transaction in flight is lost on a
> crash" then the problem gets
> much simpler. Since crashes are nominally extremely rare
> events (your own software is

Without some sort of FIFO how can first-in-first-out order
be otherwise enforced?

> more likely to be a cause of failure than the incoming
> power supply!) then the recovery is
> equally trivial: the client re-submits the requrest!

I want to provide the best solution within the specs. This
means never requiring the client to resubmit unless
absolutely necessary. As soon as the client receives an HTTP
acknowledgement that the request has been received, from
that point on the results are guaranteed. If the server
crashes then the results arrive soon after the server is
back up. If the server is incinerated, then this may take a
day or so.

I will design the best possible system, and then
incrementally implement its aspects. Some aspects may never
be implemented. Preemptive priority scheduling will not be
implemented immediately. On-the-fly offsite backup will not
be implemented immediately. Only the basic functions will be
implemented at first. I may not even provide a way for
customers to pay initially. Initially the whole service may
be free for everyone.

>
> You are over-engineering solutions to solve virtually
> non-existent problems in
> overly-complex ways. Step back, study your requirements,
> and move forward again. Throw
> out any implementation mechanism not required to meet the
> revised requirements.
> joe

Perhaps. It seems like Hector's advice is similar to what I
am proposing.

>
>>
>>> <grin>
>>>
>>> --
>>> HLS
>>
> Joseph M. Newcomer [MVP]
> email: newcomer(a)flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm

First | Prev | Next | Last
Pages: 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
Prev: Improving Pete'r Application Performance
Next: Competitors for Pet'e OCR system