From: Eric Sosman on
On 4/3/2010 10:25 PM, Peter Olcott wrote:
> [...]
> I will be receiving up to 100 web requests per second (that
> is the maximum capacity) and I want to begin processing them
> as soon as they arrive hopefully without polling for them.

They're all coming down a single pipe? Just read() the
pipe; you'll get the data as soon (almost) as it's ready.

If they're coming down many pipes/sockets/whatever, the
"classical Unix" solution is to open() a descriptor on each
source and then select() on the set of descriptors. Modern
Unices have other, less cumbersome ways to monitor multiple
sources. I'm not familiar with Linux' machinery specifically,
but look for things like poll() or /dev/poll, or maybe you'll
find a helpful "see also" on the select() man page.

> If I have to I will poll every 10 ms, in a thread that
> sleeps for 10 ms.

You haven't explained your problem clearly enough for me
to tell whether this would be a good or a bad idea. Absent
special circumstances, though, it is usually better to wait
for something than to poll for it. You poll when there's
something else you want to do in the absence of input -- and
if you've got a separate thread that's just looking for input,
I can't see what else it would do in the in-between times.

--
Eric Sosman
esosman(a)ieee-dot-org.invalid
From: Peter Olcott on

"Moi" <root(a)invalid.address.org> wrote in message
news:981fe$4bb86a0b$5350c024$32454(a)cache100.multikabel.net...
> On Sat, 03 Apr 2010 21:25:52 -0500, Peter Olcott wrote:
>
>> "Eric Sosman" <esosman(a)ieee-dot-org.invalid> wrote in
>> message
>> news:hp8flb$oa9$1(a)news.eternal-september.org...
>>> On 4/3/2010 6:10 PM, Peter Olcott wrote:
>>>> [...]
>>>> So then it is not very good for event driven
>>>> operations? What I am
>>>> looking for is some sort of callback mechanism that can
>>>> notify me when
>>>> input is available.
>>>
>>> Most Unix things wait on "state" rather than on
>>> "events."
>>> This is usually viewed as a Good Thing, because
>>> (examples):
>>>
>>> - Suppose the event occurs a moment before anyone
>>> waits
>>> for it. Lacking infinite buffer space, the kernel
>>> just
>>> throws the event away -- and *then* somebody
>>> waits,
>>> and
>>> waits, and waits, because the train has already
>>> left.
>>>
>>> - Suppose the event callback takes a little longer
>>> than
>>> expected (gets a page fault, say), and another
>>> event arrives
>>> while the first is still being processed.
>>> What
>>> now? A second simultaneous callback? Buffer the
>>> event
>>> (in that infinite memory) and call again later?
>>>
>>> That said, event-based disciplines do have their
>>> place,
>>> mostly in real-time programming. But for "ordinary
>>> user-land
>>> stuff," you'll probably find state friendlier than
>>> events.
>>
>> I will be receiving up to 100 web requests per second
>> (that is the
>> maximum capacity) and I want to begin processing them as
>> soon as they
>> arrive hopefully without polling for them. If I have to I
>> will poll
>> every 10 ms, in a thread that sleeps for 10 ms.
>
> No. If there is nothing to read, your process is blocked.
> Either in read() or in select() / poll.
> There is no real difference; except for select/poll to
> allow you to
> limit the time you are blocked to do other useful things
> in between the reads.
> Read() blocks forever.
>
> Blocking does not cost CPU. Your process is just stuck in
> a systemcall
> that has not returned yet.
> So it will cost you a thread. (which has nothing useful to
> do anyway)
>
> HTH,
> AvK
>

That would be OK. So I place the read in an infinite loop,
and when there is nothing to read, then the read never
returns so it does not cost CPU.


From: Peter Olcott on
So then are you saying that the previous respondent is
wrong?
I will only have at most two file descriptors that will be
waited on.

I really want to avoid checking to see if there is input
available a billion times per second in a tight loop. The
approach that I am considering is to check to see if input
is available 100 times per second in a tight loop that
sleeps for 10 ms using nanosleep().


"David Schwartz" <davids(a)webmaster.com> wrote in message
news:23674491-74e5-4e29-a136-41656c8cb49d(a)g10g2000yqh.googlegroups.com...
On Apr 4, 3:29 am, Moi <r...(a)invalid.address.org> wrote:

> No. If there is nothing to read, your process is blocked.
> Either in read() or in select() / poll.
> There is no real difference; except for select/poll to
> allow you to
> limit the time you are blocked to do other useful things
> in between the reads.
> Read() blocks forever.

Well, also with 'select' or 'poll', you can wait for more
than one
file descriptor to become ready.

> Blocking does not cost CPU. Your process is just stuck in
> a systemcall
> that has not returned yet.

Well, it does have CPU overhead, it just scales with the
number of
events you wait for rather than with how long you wait.

For example, if you call 'poll' to check for readiness of,
say, 100
file descriptors, the kernel will have to put your process
on 100 wait
queues, and then when any one of those descriptors comes
ready (or the
time you specified runs out), it will have to remove you
from all 100
wait queues. This can cost significant CPU.

DS


From: Peter Olcott on

"David Given" <dg(a)cowlark.com> wrote in message
news:hp9srr$77i$1(a)news.eternal-september.org...
> On 04/04/10 03:42, Peter Olcott wrote:
> [...]
>>> {
>>> set_up_file_descriptors();
>>> for (;;)
>>> {
>>> int fd = wait_for_file_descriptor_to_change_state();
>>> do_something_with(fd);
>>> }
>>> }
>>
>> That look like it would eat up too much CPU time fro my
>> CPU
>> intensive process.
>
> I think there's a misunderstanding in how this works ---
> wait_for_file_descriptor_to_change_state() *blocks* until
> a file
> descriptor changes state. That is, it uses no CPU
> whatsoever. This isn't
> a busy loop, no polling is done.
>
> (In general, this approach is the *most* efficient way of
> handling this
> --- this is the mechanism that the very fast webservers
> like thttpd use.)
>
> [...]
>> Perhaps I could wait for the
>> file size to grow?
>
> I think you're going to have to go into more details about
> what your
> problem actually is. This is a pipe; it has no file size.
> What are you
> trying to do?

I will repeat this again. I am converting an OCR desktop
application into a web application. The way that I am doing
this is to make modifications to one of many exiting open
source web servers. The web server will have multiple
threads that process all web requests. The OCR will have one
or more threads (maximum of one thread per CPU core).

I want to design the communication between the web server
and the OCR engine. If a pipe takes no CPU while it is
waiting for input within an infinite loop, then this will
suit my needs well.

>
> --
> ???? dg(a)cowlark.com ????? http://www.cowlark.com ?????
> ?
> ? "In the beginning was the word.
> ? And the word was: Content-type: text/plain" --- Unknown
> sage


From: Peter Olcott on

"Jens Thoms Toerring" <jt(a)toerring.de> wrote in message
news:81rcnaFaipU1(a)mid.uni-berlin.de...
> In comp.unix.programmer Peter Olcott
> <NoSpam(a)ocr4screen.com> wrote:
>
>> "David Given" <dg(a)cowlark.com> wrote in message
>> news:hp8gt5$38u$1(a)news.eternal-september.org...
>> > On 03/04/10 23:10, Peter Olcott wrote:
>> > [...]
>> >> So then it is not very good for event driven
>> >> operations?
>> >> What I am looking for is some sort of callback
>> >> mechanism
>> >> that can notify me when input is available.
>> >
>> > Well, you can tell the kernel to send you a signal when
>> > the file
>> > descriptor changes state --- see fcntl(F_SETFL,
>> > O_ASYNC).
>> > Normally this is SIGIO but some Unixes (like Linux)
>> > allow you to
>> > specify any signal.
>> >
>> > But, as I said, in my experience it's not very
>> > reliable, it's
>> > certainly not portable, and it's not actually terribly
>> > useful
>> > given how little you can do in a signal handler.
>> >
>> > The normal approach for doing this sort of thing is to
>> > structure
>> > your program like this:
>> >
>> > {
>> > set_up_file_descriptors();
>> > for (;;)
>> > {
>> > int fd =
>> > wait_for_file_descriptor_to_change_state();
>> > do_something_with(fd);
>> > }
>> > }
>
>> That look like it would eat up too much CPU time fro my
>> CPU
>> intensive process.
>
> What would eat up CPU time with that approach? If you use
> poll() or select() for waiting for the file descriptor to
> change then the thread will use nearly no CPU time at all
> while waiting - all that's needed is to do the poll/select
> system call. While there's nothing to read your thread
> will
> sleep, doing absolutely nothing. So the response time is
> as
> short as the machine can manage while the CPU consumption
> is
> nearly zero. Perhaps the name of the poll() function is
> giving you a wrong idea - it doesn't poll in the sense
> that
> it would try to check things out again and again in a
> thight
> loop, instead the kernel will put your process to sleep
> and
> only wake it up again when the file becomes readable
> (unless
> you also requested a timeout). And the kernel doesn't have
> to
> check for this repeatedly since it also manages all writes
> to
> the file, so when it does a write to a file it just has to
> check if there are any processes waiting for the file to
> be-
> come readable and then mark these processes for
> rescheduling
> at the earliest possible moment.
>
>> > This gives you the event-driven model you're looking
>> > for, but in
>> > a much more controlled manner.
>
> Perhaps changing the name of the function in the example
> code
> from wait_for_file_descriptor_to_change_state() to
> something
> like wait_for_file_event() would make it even clearer.
>
> And if you have already some kind of event-driven program
> then you could make a thread out of the above function,
> where
> you have the "do_something_with(fd);" part send an event
> to
> the event loop. The CPU cost of that should be minimal,
> com-
> bined with the fastest possible response time.
>
>> A general approach that I know would work would be to do
>> the
>> same sort of thing in a thread that sleeps for 10 ms on
>> every
>> iteration. That way I get 10 ms response and eat very
>> little CPU.
>
> Well, you could make the file non-blocking and then do a
> loop
> in which you try to read and then sleep for the 10 ms,
> using
> e.g. nanosleep(). Or instead of making the file
> non-blocking
> you also could use poll/select for checking, with the
> timeout
> set so that the function returns immediately. But that
> will
> probably use more CPU time than the approach above since
> you
> are now actually polling in the classical sense of the
> word
> instead of just having the kernel notify you once
> something
> has changed about the file.
>
>> I am guessing that pthreads can sleep for 10 ms.
>
> In principle yes, but you shouldn't rely on it being
> exactly
> 10 ms. If you go to sleep you can only specify a lower
> bound,
> it may take longer if other tasks with higher priority
> required
> the CPU. And the time resolution depends on how your
> machine
> is configured, nowadays it's typically around a
> milli-second
> but be prepared for worse.
>
>> Perhaps I could wait for the file size to grow?
>
> Sound to me like the most costly way to do it...

Yeah so now that I understand how pipes work a little
better, and that the function named poll() is a misnomer and
does not actually do polling, I can provide a much better
solution.

>
> Regards, Jens
> --
> \ Jens Thoms Toerring ___ jt(a)toerring.de
> \__________________________ http://toerring.de