From: victor Yankee on
Would someone be able to point me in the right direction ?

We are looking at implementing a socket server using TCP/IP socket
streams on Solaris 10. One of the requirements is that we need to
only send TCP ACK messages once our application has flushed the
payload to non-volatile memory.

We were looking at using the MSG_PEEK flag to read the data of the
socket however this call does not stop the TCP stack from acking more
packets until our RCVBUF is full. This means that if our application
dies we could be loosing data that is in our RCVBUF.

Do you know if there is any other method that we could use to do this
or do we need to use RAW sockets and or modify the kernel TCP/IP
stack ?

cheers,
Victor
From: Ian Collins on
On 04/15/10 06:01 PM, victor Yankee wrote:
> Would someone be able to point me in the right direction ?
>
> We are looking at implementing a socket server using TCP/IP socket
> streams on Solaris 10. One of the requirements is that we need to
> only send TCP ACK messages once our application has flushed the
> payload to non-volatile memory.

That sounds out of the scope of TCP and you should be using a higher
layer protocol to send an acknowledge.

What happens if more data is to be sent than the advertised size of the
receive window?

> We were looking at using the MSG_PEEK flag to read the data of the
> socket however this call does not stop the TCP stack from acking more
> packets until our RCVBUF is full. This means that if our application
> dies we could be loosing data that is in our RCVBUF.
>
> Do you know if there is any other method that we could use to do this
> or do we need to use RAW sockets and or modify the kernel TCP/IP
> stack ?

Your problem is similar to NFS and the solution is a protocol on top of TCP!

--
Ian Collins
From: victor Yankee on
On Apr 15, 4:09 pm, Ian Collins <ian-n...(a)hotmail.com> wrote:
> On 04/15/10 06:01 PM, victor Yankee wrote:
>
> > Would someone be able to point me in the right direction ?
>
> > We are looking at implementing a socket server using TCP/IP socket
> > streams on Solaris 10.  One of the requirements is that we need to
> > only send TCP ACK messages once our application has flushed the
> > payload to non-volatile memory.
>
> That sounds out of the scope of TCP and you should be using a higher
> layer protocol to send an acknowledge.
>
> What happens if more data is to be sent than the advertised size of the
> receive window?
>
> > We were looking at using the MSG_PEEK flag to read the data of the
> > socket however this call does not stop the TCP stack from acking more
> > packets until our RCVBUF is full. This means that if our application
> > dies we could be loosing data that is in our RCVBUF.
>
> > Do you know if there is any other method that we could use to do this
> > or do we need to use RAW sockets and or modify the kernel TCP/IP
> > stack ?
>
> Your problem is similar to NFS and the solution is a protocol on top of TCP!
>
> --
> Ian Collins



Hi Ian,

Unfortunately we are not able to use a higher level protocol to ack
the messages since the client application is out of our control. The
client just connects and streams the data to the socket. We are
expected to only ack the packets once they have been cached to non-
volatile memory.
The only method for telling the client that we have cached the payload
is through the TCP ack message.

cheers,
vic

From: Casper H.S. Dik on
victor Yankee <vyankee1(a)gmail.com> writes:

>Would someone be able to point me in the right direction ?

>We are looking at implementing a socket server using TCP/IP socket
>streams on Solaris 10. One of the requirements is that we need to
>only send TCP ACK messages once our application has flushed the
>payload to non-volatile memory.

Not possible in the Solaris 10 implementation; I would be *very* surprised
if any TCP implementation allows this.

>We were looking at using the MSG_PEEK flag to read the data of the
>socket however this call does not stop the TCP stack from acking more
>packets until our RCVBUF is full. This means that if our application
>dies we could be loosing data that is in our RCVBUF.

Indeed. Using ACKs is the wrong mechanism. You will probably notice
that with delayed acks as you want, the protocol will run 10-100 slower.

>Do you know if there is any other method that we could use to do this
>or do we need to use RAW sockets and or modify the kernel TCP/IP
>stack ?

You can't make this work with TCP/IP; you will need to change your
protocol and send a message when the data is written. I.e., don't
use just TCP but add a protocol on top of it.

If you change your TCP/IP implementation, it won't be TCP/IP and other
users of that (non) TCP/IP stack will likely broke.


Casper
--
Expressed in this posting are my opinions. They are in no way related
to opinions held by my employer, Sun Microsystems.
Statements on Sun products included here are not gospel and may
be fiction rather than truth.
From: Ersek, Laszlo on
On Wed, 14 Apr 2010, victor Yankee wrote:

> On Apr 15, 4:09�pm, Ian Collins <ian-n...(a)hotmail.com> wrote:
>> On 04/15/10 06:01 PM, victor Yankee wrote:
>>
>>> We are looking at implementing a socket server using TCP/IP socket
>>> streams on Solaris 10. �One of the requirements is that we need to
>>> only send TCP ACK messages once our application has flushed the
>>> payload to non-volatile memory.
>>
>> That sounds out of the scope of TCP and you should be using a higher
>> layer protocol to send an acknowledge.
>>
>> What happens if more data is to be sent than the advertised size of the
>> receive window?
>>
>
> Unfortunately we are not able to use a higher level protocol to ack the
> messages since the client application is out of our control. The client
> just connects and streams the data to the socket. We are expected to
> only ack the packets once they have been cached to non- volatile memory.

> The only method for telling the client that we have cached the payload
> is through the TCP ack message.

Some wild guessing:

- Write a TUN/TAP driver (for Solaris?) so you can not just capture stuff
on the ethernet / IP level, but you can make its propagation dependent on
saving it first somewhere. I'm not sure if you'd need to try to
reconstruct, on the fly, the exact byte stream seen by the server; it
might suffice if you write a replay program for the dump format (which
does the reassembly, reordering etc).

- If Solaris 10 supports STREAMS based TCP/IP, try to push a module of
your own creation between, well, TCP and IP; then see the previous
paragraph.

- Add a netfilter rule (if Solaris 10 supports anything like that) which
queues the IP packet to a userspace program for approval only following
storage.

- Insert a Linux box in front of the Solaris box, and do the previous
paragraph (iptables / NFQUEUE): make sure any TCP segment leaves for the
Solaris box only after you've saved it somewhere. See
<http://netfilter.org/projects/libnetfilter_queue/index.html>:

----v----
Main Features

* receiving queued packets from the kernel nfnetlink_queue subsystem
* issuing verdicts and/or reinjecting altered packets to the kernel
nfnetlink_queue subsystem
----^----

Issue an ACCEPT verdict only after saving the packet.

Cheers,
lacos