From: Chris Friesen on
On 05/12/2010 01:33 PM, Rainer Weikusat wrote:
> Chris Friesen <cbf123(a)mail.usask.ca> writes:
>> On 05/10/2010 01:01 PM, yirgster wrote:
>>
>>> But, from a "legal" standpoint,
>>>
>>> (1) this isn't required behavior by posix, that is, that fsync(fd)
>>> sync all mmap'd(fd) memory too.
>>
>> Contrary to Rainer, I think it actually might be implied by posix, and
>> that's why the various OS's have changed their behaviour.
>>
>> The posix language reads "all data for the open file descriptor named by
>> fildes is to be transferred to the storage device associated with the
>> file described by fildes." Arguably, memory ranges mmap'd from a file
>> is "data for the open file descriptor".
>
> The situation isn't that simple, eg, it is legal to close a file
> descriptor after it was used to establish a memory mapping and to
> continue using the mapping. Assuming that the file is later reopened,
> is whatever the existing memory mapping contains necessarily 'data for
> the new file descriptor' (or only if the implementation happens to
> have a unified cache)?

I agree that the wording is a bit unclear, but they left it that way on
purpose. From the posix rationale:

"The fsync() function is intended to force a physical write of data from
the buffer cache, and to assure that after a system crash or other
failure that all data up to the time of the fsync() call is recorded on
the disk. Since the concepts of "buffer cache", "system crash",
"physical write", and "non-volatile storage" are not defined here, the
wording has to be more abstract."

Based on the above, I see no reason to treat data modified via memory
mappings any different than data written by a write() syscall.

That said, if _POSIX_SYNCHRONIZED_IO is not defined, the spec explicitly
allows a null implementation of fcntl()...but it must be documented in
the compliance document.

Chris
From: yirgster on
On May 12, 12:33 pm, Rainer Weikusat <rweiku...(a)mssgmbh.com> wrote:
> Chris Friesen <cbf...(a)mail.usask.ca> writes:
> > On 05/10/2010 01:01 PM, yirgster wrote:
>
> >> But, from a "legal" standpoint,
>
> >> (1) this isn't required behavior by posix, that is, that fsync(fd)
> >> sync all mmap'd(fd) memory too.
>
> > Contrary to Rainer, I think it actually might be implied by posix, and
> > that's why the various OS's have changed their behaviour.
>
> > The posix language reads "all data for the open file descriptor named by
> > fildes is to be transferred to the storage device associated with the
> > file described by fildes."  Arguably, memory ranges mmap'd from a file
> > is "data for the open file descriptor".
>
> The situation isn't that simple, eg, it is legal to close a file
> descriptor after it was used to establish a memory mapping and to
> continue using the mapping. Assuming that the file is later reopened,
> is whatever the existing memory mapping contains necessarily 'data for
> the new file descriptor' (or only if the implementation happens to
> have a unified cache)?

Under msync(MS_SYNC) it would have had to make it out to disk, so it
will be seen by any open and file access that follows after.

I've assumed all along that we've been talking about mmap(...
MAP_SHARED ...)
From: Ersek, Laszlo on
On Wed, 12 May 2010, Rainer Weikusat wrote:

> Chris Friesen <cbf123(a)mail.usask.ca> writes:
>> On 05/10/2010 01:01 PM, yirgster wrote:
>>
>>> But, from a "legal" standpoint,
>>>
>>> (1) this isn't required behavior by posix, that is, that fsync(fd)
>>> sync all mmap'd(fd) memory too.
>>
>> Contrary to Rainer, I think it actually might be implied by posix, and
>> that's why the various OS's have changed their behaviour.
>>
>> The posix language reads "all data for the open file descriptor named
>> by fildes is to be transferred to the storage device associated with
>> the file described by fildes." Arguably, memory ranges mmap'd from a
>> file is "data for the open file descriptor".
>
> The situation isn't that simple, eg, it is legal to close a file
> descriptor after it was used to establish a memory mapping and to
> continue using the mapping. Assuming that the file is later reopened, is
> whatever the existing memory mapping contains necessarily 'data for the
> new file descriptor' (or only if the implementation happens to have a
> unified cache)?

I don't think so.

POSIX very carefully distinguishes file descriptor from file description
from file. The language quoted above is "all data for the open file
descriptor". Ie. the distinction is made on the most specific (least
shared) level. If you dup()licate a file descriptor, you get a new
descriptor referring to the same open file description [0] [1]. But
fsync() only needs to synchronize changes made through the exact file
descriptor that is passed to it.

If the spec went a single level deeper, ie. to file description, that
would require an fsync() call issued by process A to synchronize changes
made by process B with write(), for which B used a descriptor that it
inherited from A through a series of fork()s and exec()s, or one that it
received over a UNIX domain socket with SCM_RIGHTS.

(Btw, I found only one mention of "SCM_RIGHTS" in SUSv4 [2], and it only
"Indicates that the data array contains the access rights to be sent or
received." The Linux manual is more specific [3]: it not only mentions
that the "access rights" are file descriptors, but it also states that
SCM_RIGHTS is effectively a cross-process dup().)

Therefore, it seems to me, once you close a file descriptor, you may lose
any opportunity to fsync() the changes made through it.

I can't imagine that fsync() -- being permitted to ignore any changes made
through a different file descriptor -- would be *required* to care about
modifications performed through something that is not even a file
description.

In closing, if you don't mind, I'll quote myself; it seems relevant to
some extent.

----v----
Date: Fri, 2 Apr 2010 20:58:22 +0200
From: "Ersek, Laszlo" <lacos(a)caesar.elte.hu>
Newsgroups: comp.programming.threads, comp.unix.programmer,
comp.os.linux.development.system, comp.os.linux.development.apps
Subject: Re: IPC based on name pipe FIFO and transaction log file
Message-ID: <Pine.LNX.4.64.1004021950500.19039(a)login01.caesar.elte.hu>

[snip]

Would anybody please validate the following table?

+-------------+----------------------------------------------------------------+
| change made | change visible via |
| through +----------------------------+-------------+---------------------+
| | MAP_SHARED | MAP_PRIVATE | read() |
+-------------+----------------------------+-------------+---------------------+
| MAP_SHARED | yes | unspecified | depends on MS_SYNC, |
| | | | MS_ASYNC, or normal |
| | | | system activity |
+-------------+----------------------------+-------------+---------------------+
| MAP_PRIVATE | no | no | no |
+-------------+----------------------------+-------------+---------------------+
| write() | depends on MS_INVALIDATE, | unspecified | yes |
| | or the system's read/write | | |
| | consistency | | |
+-------------+----------------------------+-------------+---------------------+

----^----

Cheers,
lacos

[0] http://www.opengroup.org/onlinepubs/9699919799/functions/dup.html
[1] http://www.opengroup.org/onlinepubs/9699919799/functions/fcntl.html
[2] http://www.opengroup.org/onlinepubs/9699919799/basedefs/sys_socket.h.html
[3] http://www.kernel.org/doc/man-pages/online/pages/man7/unix.7.html
From: yirgster on
On May 12, 12:53 pm, Chris Friesen <cbf...(a)mail.usask.ca> wrote:
> On 05/12/2010 01:33 PM, Rainer Weikusat wrote:
>
>
>
> > Chris Friesen <cbf...(a)mail.usask.ca> writes:
> >> On 05/10/2010 01:01 PM, yirgster wrote:
>
> >>> But, from a "legal" standpoint,
>
> >>> (1) this isn't required behavior by posix, that is, that fsync(fd)
> >>> sync all mmap'd(fd) memory too.
>
> >> Contrary to Rainer, I think it actually might be implied by posix, and
> >> that's why the various OS's have changed their behaviour.
>
> >> The posix language reads "all data for the open file descriptor named by
> >> fildes is to be transferred to the storage device associated with the
> >> file described by fildes."  Arguably, memory ranges mmap'd from a file
> >> is "data for the open file descriptor".
>
> > The situation isn't that simple, eg, it is legal to close a file
> > descriptor after it was used to establish a memory mapping and to
> > continue using the mapping. Assuming that the file is later reopened,
> > is whatever the existing memory mapping contains necessarily 'data for
> > the new file descriptor' (or only if the implementation happens to
> > have a unified cache)?
>
> I agree that the wording is a bit unclear, but they left it that way on
> purpose.  From the posix rationale:
>
> "The fsync() function is intended to force a physical write of data from
> the buffer cache, and to assure that after a system crash or other
> failure that all data up to the time of the fsync() call is recorded on
> the disk. Since the concepts of "buffer cache", "system crash",
> "physical write", and "non-volatile storage" are not defined here, the
> wording has to be more abstract."
>
> Based on the above, I see no reason to treat data modified via memory
> mappings any different than data written by a write() syscall.
>
> That said, if _POSIX_SYNCHRONIZED_IO is not defined, the spec explicitly
> allows a null implementation of fcntl()...but it must be documented in
> the compliance document.
>
> Chris

I still don't think it's proven since "all data up to the time of
fsync()" seems conditioned on the preceding phrase "physical write of
data from the buffer cache." So, we're back to the unified buffer
cache issue.
From: yirgster on
lacos writes:

> [ snip ]
> POSIX very carefully distinguishes file descriptor from file description
> from file. The language quoted above is "all data for the open file
> descriptor". Ie. the distinction is made on the most specific (least
> shared) level. If you dup()licate a file descriptor, you get a new
> descriptor referring to the same open file description [0] [1]. But
> fsync() only needs to synchronize changes made through the exact file
> descriptor that is passed to it.

I agree with this reading. You know, looking at the discussion this
issue has engendered, and assuming yours is an absolutely correct
reading based on the writing (as I think it is), it still should have
been more explicitly clarified in the doc, e.g., "It doesn't apply to
other fd's even in the same process." I mean, the purpose is
understanding and clarity, no? Not Talmudic scholarship.

> [snip socket stuff -- I have no idea]

> Therefore, it seems to me, once you close a file descriptor, you may lose
> any opportunity to fsync() the changes made through it.

Yes, I agree with this too.

> I can't imagine that fsync() -- being permitted to ignore any changes made
> through a different file descriptor -- would be *required* to care about
> modifications performed through something that is not even a file
> description.

Sounds correct. But, it's not relevant to the issue of mmap() of a
file description being implied by fsync of the same fd.

> In closing, if you don't mind, I'll quote myself; it seems relevant to
> some extent.

I rarely mind advertisements for myself. Even from others. I do it all
the time.

> Would anybody please validate the following table?

Validate your table? I am sufficiently trustworthy (forget
knowledgeable)?

> +-------------+----------------------------------------------------------------+
> | change made | change visible via                                             |
> | through     +----------------------------+-------------+---------------------+
> |             | MAP_SHARED                 | MAP_PRIVATE | read()              |
> +-------------+----------------------------+-------------+---------------------+
> | MAP_SHARED  | yes                        | unspecified | depends on MS_SYNC, |
> |             |                            |             | MS_ASYNC, or normal |
> |             |                            |             | system activity     |
> +-------------+----------------------------+-------------+---------------------+
> | MAP_PRIVATE | no                         | no          | no                  |
> +-------------+----------------------------+-------------+---------------------+
> | write()     | depends on MS_INVALIDATE,  | unspecified | yes                 |
> |             | or the system's read/write |             |                     |
> |             | consistency                |             |                     |
> +-------------+----------------------------+-------------+---------------------+

Well, I'm not sure I understand your table completely. But here goes:

Under MAP_PRIVATE, 2nd row, I don't understand the qualifications. It
simply seems to me: unspecified. From the mmap() page: "It is
unspecified whether modifications to the underlying object done after
the MAP_PRIVATE mapping is established are visible through the
MAP_PRIVATE mapping." So what would MS_SYNC, MS_ASYNC, have to do with
it?

MS_INVALIDATE: there's a reality problem here, I believe. This is
that, from reading other posts on this subject back around 2002-2004,
that it's basically a no-op in some of the os's (linux? - I can't look
at the source now.) Also, it would be pretty hard to test, no? Isn't
it the same race condition between say, the store buffers and memory
cache consistency, of recent discussion in the threads group.

Speaking of reality (but why should this interfere with our thinking),
I know--i.e., actually seen, I'm not talking theoretically--a case in
which linux (at that time at least) did not in one instance meet the
posix spec. I keep thinking it was in zero'ing out the last page of
the file correctly. But this seems too obvious. Whatever it was, it
worked properly on Solaris, AIX, and HP. I saw it.

Well, got to show some motion at work. Hope you're not so unfortunate.