Increase performance of pseudo terminals? [Unix Programming]

Prev: Linking problem using autotool
Next: why do the Muslims bow down to ward the Ka'

From: David Schwartz on 4 Jan 2010 02:39

On Jan 3, 5:50 am, Jef Driesen <jefdrie...(a)hotmail.com.invalid> wrote:

> What socat does is a simple select() loop, waiting until data arrives on
> one of the two pseudo terminals. When data is available on one side, it
> is read(), and then write() back to the other one (and vice versa of
> course). I would think the overhead is quite small.

> Or are you referring to the communication protocol, where the sending
> side needs to wait every time until the receiving side requests a packet?

You have to put it all together.

Sending side writes some data to the pty slave. It then has to go to
the pty master. Then socat must get a 'select' hit. Then socat does an
extra 'select' for some reason. Then socat does a read. Then socat
does a write to the other pty master. Then the data goes to the other
pty slave. Then the receiver wakes up. Then the receiver calls 'recv'.
And at this point, we're about 1/3 (!) done with the sending process
because there's a backflow on all of these and the sender passes a
flush down the line.

Then, when all that's finished, we're half done. The receiver has to
send an ACK, which also has to follow the line. The receiver also does
a flush.

Then, you'd think we'd be ready to send the next block, but you'd be
wrong. The sender does an extra flush after receiving the ACK in case
there was some line noise that might corrupt the beginning of the
block.

It's a worst case scenario all around. It just plain shouldn't be
done.

DS

From: Jef Driesen on 4 Jan 2010 10:11

On 4/01/2010 8:39, David Schwartz wrote:
> On Jan 3, 5:50 am, Jef Driesen<jefdrie...(a)hotmail.com.invalid> wrote:
>
>> What socat does is a simple select() loop, waiting until data arrives on
>> one of the two pseudo terminals. When data is available on one side, it
>> is read(), and then write() back to the other one (and vice versa of
>> course). I would think the overhead is quite small.
>
>> Or are you referring to the communication protocol, where the sending
>> side needs to wait every time until the receiving side requests a packet?
>
> You have to put it all together.
>
> Sending side writes some data to the pty slave. It then has to go to
> the pty master. Then socat must get a 'select' hit. Then socat does an
> extra 'select' for some reason. Then socat does a read. Then socat
> does a write to the other pty master. Then the data goes to the other
> pty slave. Then the receiver wakes up. Then the receiver calls 'recv'.
> And at this point, we're about 1/3 (!) done with the sending process
> because there's a backflow on all of these and the sender passes a
> flush down the line.
>
> Then, when all that's finished, we're half done. The receiver has to
> send an ACK, which also has to follow the line. The receiver also does
> a flush.
>
> Then, you'd think we'd be ready to send the next block, but you'd be
> wrong. The sender does an extra flush after receiving the ACK in case
> there was some line noise that might corrupt the beginning of the
> block.
>
> It's a worst case scenario all around. It just plain shouldn't be
> done.

I didn't think about the master<->slave flow.

What do you mean with 'backflow' and 'flush'?

I perfectly understand there is some performance penalty to pay for the
socat setup, but I expected it to be less. Especially because other
people seem to get much better performance. Factors like 10x and even
100x times faster is a big difference.

From: David Schwartz on 4 Jan 2010 10:19

On Jan 4, 7:11 am, Jef Driesen <jefdrie...(a)hotmail.com.invalid> wrote:

> I didn't think about the master<->slave flow.

I think that's what's causing the disaster.

> What do you mean with 'backflow' and 'flush'?

When sx sends the data, the control flow has to get back to sx before
it do anything else. That requires 'unthreading' your way through the
path you just followed to get the data from sx to rx.

> I perfectly understand there is some performance penalty to pay for the
> socat setup, but I expected it to be less. Especially because other
> people seem to get much better performance. Factors like 10x and even
> 100x times faster is a big difference.

Here's a high-level view from my machine. The fields are:
1) Relative time operation complete. (In seconds.)
2) Relative time operation started. (In seconds.)
3) Program, SO=socat, SX=sx, RX=sx
4) System call, parameters, = return value

..094439 .094431 SX read(3, "15: DATA..."..., 128) =
128

..094494 .094474 SX write(1, "\1G\27015: DATA..."..., 132) =
132
..095243 .093526 SO select(6, [3 5], [], [], NULL) = 1 (in [3])
..095280 .095271 SO read(3, "\1G\27015: DATA..."..., 8192) =
132

..095324 .095314 SO write(5, "\1G\27015: DATA..."..., 132) =
132
..095372 .095362 SO select(6, [3 5], [5], [], NULL) = 1 (out [5])
..096358 .092610 RX read(0, "\1G\27015: DATA..."..., 8192) =
132

..096442 .096431 RX write(1, "\6", 1) = 1
..097352 .095410 SO select(6, [3 5], [], [], NULL) = 1 (in [5])
..097400 .097391 SO read(5, "\6", 8192) = 1

..097439 .097429 SO write(3, "\6", 1) = 1
..098364 .094614 SX read(0, "\6", 128) = 1

If you notice, the main delay seems to be between when the data is
written to the pty slave and when the process 'select'ing on the pty
master wakes up.

DS

From: David Schwartz on 4 Jan 2010 10:20

On Jan 4, 7:19 am, David Schwartz <dav...(a)webmaster.com> wrote:

> If you notice, the main delay seems to be between when the data is
> written to the pty slave and when the process 'select'ing on the pty
> master wakes up.

Which, by the way, means you were probably right. This seems to be a
performance issue with the pty subsystem, possibly interacting with
the scheduler.

DS

From: David Schwartz on 4 Jan 2010 11:09

Aha!

/**
* tty_schedule_flip - push characters to ldisc
* @tty: tty to push from
*
* Takes any pending buffers and transfers their ownership to the
* ldisc side of the queue. It then schedules those characters
for
* processing by the line discipline.
*
* Locking: Takes tty->buf.lock
*/

void tty_schedule_flip(struct tty_struct *tty)
{
unsigned long flags;
spin_lock_irqsave(&tty->buf.lock, flags);
if (tty->buf.tail != NULL)
tty->buf.tail->commit = tty->buf.tail->used;
spin_unlock_irqrestore(&tty->buf.lock, flags);
schedule_delayed_work(&tty->buf.work, 1);
}
EXPORT_SYMBOL(tty_schedule_flip);

This is the problem. This code specifically asks that the wakeup be
delayed one jiffy. Changing the "1" to a "0" should eliminate the
problem, though performance might we worse for "bulk data" cases.
(Consider a program that 'dribbles' bytes into the pty.)

My bet is the coder assumed that ttys would always be much slower than
the tick rate and so it made sense to accumulate characters rather
than scheduling a consumer more than once in a single tick.

DS

First | Prev | Next | Last
Pages: 1 2 3 4 5
Prev: Linking problem using autotool
Next: why do the Muslims bow down to ward the Ka'