Line-by-line processing when stdin is not a tty [Python]

Prev: Access lotus notes using Python 2.5.1
Next: regex to remove lines made of only whitespace

From: Cameron Simpson on 11 Aug 2010 08:18

On 11Aug2010 10:32, Tim Harig <usernet(a)ilthio.net> wrote:
| On 2010-08-11, Wolfgang Rohdewald <wolfgang(a)rohdewald.de> wrote:
| > On Mittwoch 11 August 2010, Cameron Simpson wrote:
| >> Usually you either
| >> need an option on the upstream program to tell it to line
| >> buffer explicitly
| >
| > once cat had an option -u doing exactly that but nowadays
| > -u seems to be ignored
| >
| > http://www.opengroup.org/onlinepubs/009695399/utilities/cat.html
|
| I have to wonder why cat knows or cares. Since we are referring to
| a single directional pipe, there is no fear of creating any kind of
| race condition. In general, I would expect that the shell opens the
| pipe (pipe()), fork()s, closes its own 0 or 1 descriptor as appropriate
| for each child, copies (dup()) one the file descriptors to the
| appropriate file descriptor for the child process, and exec()s to call
| the new process. Neither of the processes, in general, needs to know
| anything other the to write and read from their given descriptors.

The buffering is a performance choice. Every write requires a context
switch from userspace to kernel space, and availability of data in the
pipe will wake up a downstream process blocked trying to read.

It is far more efficient to do as few such copies as possible, so where
interaction (as you point out) is one way it's usually better to write
data in larger chunks. But when writing to a terminal, ostensibly for a
human to read, line buffering is generally better (for exactly the issue
the OP tripped over - humans expect stuff to happen as it occurs).
--
Cameron Simpson <cs(a)zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

From: Tim Harig on 11 Aug 2010 08:35

On 2010-08-11, Cameron Simpson <cs(a)zip.com.au> wrote:
> On 11Aug2010 10:32, Tim Harig <usernet(a)ilthio.net> wrote:
>| On 2010-08-11, Wolfgang Rohdewald <wolfgang(a)rohdewald.de> wrote:
>| > On Mittwoch 11 August 2010, Cameron Simpson wrote:
>| >> Usually you either
>| >> need an option on the upstream program to tell it to line
>| >> buffer explicitly
>| >
>| > once cat had an option -u doing exactly that but nowadays
>| > -u seems to be ignored
>| >
>| > http://www.opengroup.org/onlinepubs/009695399/utilities/cat.html
>|
>| I have to wonder why cat knows or cares. Since we are referring to
>| a single directional pipe, there is no fear of creating any kind of
>| race condition. In general, I would expect that the shell opens the
>| pipe (pipe()), fork()s, closes its own 0 or 1 descriptor as appropriate
>| for each child, copies (dup()) one the file descriptors to the
>| appropriate file descriptor for the child process, and exec()s to call
>| the new process. Neither of the processes, in general, needs to know
>| anything other the to write and read from their given descriptors.
>
> The buffering is a performance choice. Every write requires a context
> switch from userspace to kernel space, and availability of data in the
> pipe will wake up a downstream process blocked trying to read.
>
> It is far more efficient to do as few such copies as possible, so where
> interaction (as you point out) is one way it's usually better to write
> data in larger chunks. But when writing to a terminal, ostensibly for a
> human to read, line buffering is generally better (for exactly the issue
> the OP tripped over - humans expect stuff to happen as it occurs).

Right, I don't question the optimization. I question whether the
intelligence that performes that optimation should be placed within cat or
whether it should be placed within the shell. It seems to me that the
shell has a better idea of how the command is being used and can therefore
make a better decision about whether or not buffering is appropriate.

From: Grant Edwards on 11 Aug 2010 10:13

On 2010-08-11, Tim Harig <usernet(a)ilthio.net> wrote:
> On 2010-08-11, RG <rNOSPAMon(a)flownet.com> wrote:
>> When stdin is not a tty, Python seems to buffer all the input through
>> EOF before processing any of it:
>>
>> [ron(a)mickey:~]$ cat | python
>> print 123
>> print 456 <hit ctrl-D here>
>> 123
>> 456
>>
>> Is there a way to get Python to process input line-by-line the way it
>> does when stdin is a TTY even when stdin is not a TTY?
>
> It would be much better to know the overall purpose of what you are trying
> to achieve. There are may be better ways (ie, sockets) depending what you
> are trying to do. Knowing your target platform would also be helpful.
>
> For the python interpeter itself, you can can get interactive behavior by
> invoking it with the -i option.

If you're talking about unbuffered stdin/stdout, the option is -u.

I don't really see how the -i option is relevent -- it causes the
interpreter to go into interactive mode after running the script.

> If you want to handle stdin a single line at a time from inside of your
> program, you can access it using sys.stdin.readline().

That doesn't have any effect on stdin buffering.

--
Grant Edwards grant.b.edwards Yow! ... I want to perform
at cranial activities with
gmail.com Tuesday Weld!!

From: Peter Otten on 11 Aug 2010 10:49

Grant Edwards wrote:

> On 2010-08-11, Tim Harig <usernet(a)ilthio.net> wrote:
>> On 2010-08-11, RG <rNOSPAMon(a)flownet.com> wrote:
>>> When stdin is not a tty, Python seems to buffer all the input through
>>> EOF before processing any of it:
>>>
>>> [ron(a)mickey:~]$ cat | python
>>> print 123
>>> print 456 <hit ctrl-D here>
>>> 123
>>> 456
>>>
>>> Is there a way to get Python to process input line-by-line the way it
>>> does when stdin is a TTY even when stdin is not a TTY?
>>
>> It would be much better to know the overall purpose of what you are
>> trying
>> to achieve. There are may be better ways (ie, sockets) depending what
>> you
>> are trying to do. Knowing your target platform would also be helpful.
>>
>> For the python interpeter itself, you can can get interactive behavior by
>> invoking it with the -i option.
>
> If you're talking about unbuffered stdin/stdout, the option is -u.
>
> I don't really see how the -i option is relevent -- it causes the
> interpreter to go into interactive mode after running the script.

I'd say the following looks like what the OP was asking for:

$ cat | python -i -c'import sys; sys.ps1=""'
print sys.stdin.isatty()
False
print 1
1
print 2
2

(Whether it's useful is yet another question)

>> If you want to handle stdin a single line at a time from inside of your
>> program, you can access it using sys.stdin.readline().
>
> That doesn't have any effect on stdin buffering.

"for line in stream"-style file iteration uses an internal buffer that is
not affected by the -u option; stream.readline() doesnt use this
optimization.

Peter

From: Grant Edwards on 11 Aug 2010 11:01

On 2010-08-11, Peter Otten <__peter__(a)web.de> wrote:
> Grant Edwards wrote:

>>> If you want to handle stdin a single line at a time from inside of
>>> your program, you can access it using sys.stdin.readline().
>>
>> That doesn't have any effect on stdin buffering.
>
> "for line in stream"-style file iteration uses an internal buffer that is
> not affected by the -u option; stream.readline() doesnt use this
> optimization.

You're right. Why didn't I know that?

Using "for line in sys.stdin" does it's own buffering.

In my tests using sys.stdin.readline() worked as the OP desired either
with or without -u, either with or without cat. IOW, "cat" isn't
buffering output on my system (or if it is, it's line-buffering).

--
Grant Edwards grant.b.edwards Yow! I don't know WHY I
at said that ... I think it
gmail.com came from the FILLINGS in
my rear molars ...

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Access lotus notes using Python 2.5.1
Next: regex to remove lines made of only whitespace