From: Davide Libenzi on
On Tue, 22 Sep 2009, Andreas Gruenbacher wrote:

> The fatal flaw of syscall interception is race conditions: you look up a
> pathname in your interception layer; then when you call into the proper
> syscall, the kernel again looks up the same pathname. There is no way to
> guarantee that you end up at the same object in both lookups. The security
> and fsnotify hooks are placed in the appropriate spots to avoid exactly that.

Fatal? You mean, for this corner case that the anti-malware industry lived
with for so much time (in Linux and Windows), you're prepared in pushing
all the logic that is currently implemented into their modules, into the
kernel?
This includes process whitelisting, path whitelisting, caches, userspace
access API definition, and so on? On top of providing a generally more
limited interception.
Why don't we instead offer a lower and broader level of interception,
letting the users decide if such fatal flaw needs to be addressed or
not, in their modules?
They get a broader inteception layer, with the option to decide if or if
not address certain scenarios, and we get less code inside the kernel.
A win/win situation, if you ask me.


- Davide


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Paris on
On Tue, 2009-09-22 at 17:31 +0200, Andreas Gruenbacher wrote:
> On Tuesday, 22 September 2009 16:51:39 Davide Libenzi wrote:
> > On Tue, 22 Sep 2009, Jamie Lokier wrote:
> > > I don't mind at all if fanotify is replaced by a general purpose "take
> > > over the system call table" solution ...
> >
> > That was not what I meant ;)
> > You'd register/unregister as syscall interceptor, receiving syscall number
> > and parameters, you'd be able to return status/error codes directly, and
> > you'd have the ability to eventually change the parameters. All this
> > should be pretty trivial code, and at the same time give full syscall
> > visibility to the modules.
>
> The fatal flaw of syscall interception is race conditions:

That's not the fatal flaw. The fatal flaw is that I am not going to
write 90% of a rootkit and make it easy to use. Not going to happen.
There's a reason we went to the trouble to mark the syscall call RO, we
don't export it, and we don't want people playing with it. It clearly
would have been the quickest, easiest, and fastest way to make
anti-virus companies happy, but it doesn't really solve a good problem
and it leaves all of us in a worse position than we are today. Easy !=
Good.

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Jamie Lokier on
Eric Paris wrote:
> That's not the fatal flaw. The fatal flaw is that I am not going to
> write 90% of a rootkit and make it easy to use.

I hate to point out the obvious, but fanotify's ability to intercept
every file access and rewrite the file before the access proceeds is
also 90% of a rootkit...

But fortunately both fanotify and syscall rewriting require root in
the first place.

I think that makes the rootkit argument moot. As long as fanotify
doesn't have a non-root flavour... which really would be handy for
rootkits :-)

> Easy != Good.

I agree.

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Eric Paris on
On Mon, 2009-09-21 at 22:04 +0200, Andreas Gruenbacher wrote:
> On Saturday, 19 September 2009 5:04:31 Eric Paris wrote:
> > Let me start by saying I am agreeing I should pursue subtree
> > notification. It's what I think everyone really wants. It's a great
> > idea, and I think you might have a simple way to get close. Clearly
> > these are avenues I'm willing and hoping to pursue. Also I say it
> > again, I believe the interface as proposed (except maybe some of my
> > exclusion stuff) is flexible enough to implement any of these ideas.
> > Does anyone disagree?
>
> It does seem flexible enough. However, the current interface assumes "global"
> listeners (the mask argument of fanotify_init):
>
> int fanotify_init(int flags, int f_flags, __u64 mask,
> unsigned int priority);
>
> Once subtree support is added, this parameter becomes obsolete. That's pretty
> broken for a syscall yet to be introduced.

Absolutely not obsolete. Subtree notification cannot do fscking all
notification.

> > BUT to solve one of the main problems fanotify is intending to solve it
> > needs a way to be the 'fscking all notifier.' It needs to be the whole
> > damn system.
>
> Think of a system after boot, with a single global namespace. Whatever you
> access by filename is reachable from the namespace root. At this point,
> nothing more global exists. A listener can watch the mount points of
> interest, and everything's fine.

this is true, if there is only one namespace subtree notification works
the same as global notification.

> What's a bit more tricky is to ensure that this listener will continue to
> receive all events from whatever else is mounted anywhere, irrespective of
> namespaces. I think we can get there.

Lets say I want the subtree under / to get every event on the system. A
process comes along and clones the namespace. Then lets say that
process mounts something inside his new namespace. There is absolutely
no path between my / and that new mount. How can subtree checking
possibly find and indicate it wants notification about this mount? I
don't see how subtree checking could do it. There can be completely
disjoint trees with no overlap.

mount -t tmpfs none /to_umount
clone namespace
mount -t tmpfs none /to_umount/private
pivot_root /tmp_umount/private
Something else umounts /to_umount

That process is in an completely detached namespace? right?

Heck, there could be operations on files that aren't in ANY namespace.

a = open(/path/to/dir/);
umount -l /path/to/
openat(a, "filename");

I don't see how subtree notification can possibly solve the global
notification problem.

I've been thinking that checking CAP_SYS_RAWIO as well as CAP_SYS_ADMIN
might be reasonable when trying to use a global listener. If you can
CAP_SYS_RAWIO I sorta feel like you can break out of a namespace anyway,
right?

-Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
From: Andreas Gruenbacher on
On Tuesday, 22 September 2009 23:06:16 Eric Paris wrote:
> this is true, if there is only one namespace subtree notification works
> the same as global notification.
>
> [...]
>
> I don't see how subtree notification can possibly solve the global
> notification problem.

I'm thinking of is something like this: A listener registers interest in "/",
recursively. The kernel sets a FSNOTIFY_WATCH_RECURSIVE flag on "/" and each
mount point below. Afterwards when something is mounted anywhere, same
namespace or not, the kernel sets the new mount's FSNOTIFY_WATCH_RECURSIVE
flag if the parent mount has this flag set.

(Of course we need per fsnotify_group flags and not global ones, but this
doesn't change the principle.)

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo(a)vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/