JavaScript code mangler / scrambler / ... khm, more than obfuscator... :) [JavaScript]

Prev: Code Guidelines
Next: JavaScript code mangler / scrambler / ... khm, more than obfuscator... :)

From: Scott Sauyet on 29 Dec 2009 17:49

On Dec 29, 12:46 pm, Thomas 'PointedEars' Lahn <PointedE...(a)web.de>
wrote:
> Scott Sauyet wrote:
>> Thomas 'PointedEars' Lahn wrote:
>> Yes, we all know how to read Wikipedia...
>
> Apparently not.

True, it's pretty clear that at least one person in this thread
doesn't properly understand the implications of Rice's theorem! :-)

>> You didn't respond to Lasse Nielsen's last post, but, if you're right
>> that Rice's theorem says that you cannot programatically convert a
>> function to another of identical behavior, how do explain the
>> existence of compilers?
>
> A compiler transforms source code into machine-dependent code; IOW, it
> translates one higher-level language into another, lower-level language.
> That is a concept fundamentally different from that which Rice's theorem
> addresses.

Right, but the form of obfuscation that Lasse and I are suggesting is
closer to that than to a transformation at the algorithmic level. No
one is saying that we have to know that a particular function computes
square roots in order to transform it. The point is that no matter
what a function does, we can transform it in various syntactic manners
that do not change its behavior. I still don't think it's worthwhile
doing, and I think anything that could be done like this is as subject
to reversal as is, say, minification, but you seem to be missing the
point that the only comprehension of the program required for this
particular transformation is to recognize function declarations. We
don't have to in any way understand them.

>>> I find it rather curious that nobody appears to have recognized that
>>> even though the proof for Rice's theorem in the Wikipedia is based on
>>> the very same trivial property that the OP used in their question (the
>>> squaring function), and I pointed that out in my answer.
>
>> That means only that we can't algorithmically recognize any and all
>> sets of statements that represent squaring functions.
>
> It means that no *program* can do that.

No, it's broader than that. It means there is no algorithm that can
do this (assuming there is no brute-force mechanism that can test all
possible inputs.) A program can be thought of a representation of an
algorithm in a particular language; Rice's theorem is stronger in that
it's formal algorithms rejected.

>> It doesn't imply that we can't transform one squaring function into
>> another.
>
> It does imply that a *program* cannot do that.

Note the word "function". I defy you to find a Javascript squaring
*function*, f, for which the Y-combinator applied to a function
returning f will not also yield a squaring function defined on the
same domain.

>> [ description of Y-combinator applied to a particular function ]
>>
> What you are still missing is that the "Y-combinator" was applied by *you*
> here, an (arguably) intelligent being applying informed reasoning; _not_ a
> program implementing an algorithm. For a simple implementation of the
> example used in the proof for Rice's theorem, a program implementing an
> algorithm would need to recognize whether f() in
>
> function a(j)
> {
> while (--j) b(j);
> }
>
> function f(x)
> {
> a(i);
> return x * x;
> }
>
> was an implementation of the squaring function or not in order to compute
> or select a suitable replacement for obfuscation.

No, I contend that without analyzing the behavior of 'a' or 'f' above,
these definitions will have the same formal behavior, for any given
values of 'b', 'i', and 'x':

var a = (function(f) {
return (function(g) {
return g(g);
})(function(h) {
return function() {
return f(h(h)).apply(null, arguments);
};
});
})(function() {
return function(j) {while (--j) b(j);}
});

var f = (function(f) {
return (function(g) {
return g(g);
})(function(h) {
return function() {
return f(h(h)).apply(null, arguments);
};
});
})(function() {
return function(x) {
a(i);
return x * x;
}
});

I am not about to try to write a Javascript parser to prove that this
can be done syntactically, but the algorithm is simple:

For every function definition in the input with name ~n~, parameter
list ~p~, and body ~b~, replace that definition in the output with

var ~name~ = (function(f) {
return (function(g) {
return g(g);
})(function(h) {
return function() {
return f(h(h)).apply(null, arguments);
};
});
})(function() {
return function(~p~) {
~b~
}
});

I simply *don't care* that one of these functions putatively computes
the square of its input.

> It cannot do that
> because it depends on the pre-call value of `i' (and the behavior of b(j)),
> whether a(i) halts, and if it halts, if execution even reaches the `return'
> statement in f(), *regardless* that it is there. And the program cannot
> recognize whether a(i), i.e. the while-loop in it, halts, let alone
> whether, if it halts, the `return' statement in f() is ever reached.

Can you find values of 'i', 'b', and 'x' for which the formal behavior
of my replacement code differs from your original? I know that it
will be less performant, but besides that, can you find differences?

>> We're pretty far afield from the OP's goals, though.
>
> Obfuscation was part of the goal, and that has been attained to a certain
> degree. However, it has necessarily _not_ been attained *through a
> program*, of course, and so missing the other part, of course.

No one has written a program, it's true. But we've described
algorithms which could be implemented by someone who knows how to
write a Javascript parser/transformer.

You're an (arguably :-) ) intelligent person. Can you tell me what's
wrong with the following argument:

| Removing extraneous white space in a program is something a human
| can do, but this sort of minification cannot be done by a program
| because of restrictions described in the Wikipedia article on
| Rice's Theorem.
|
| Rice's Theorm means in a nutshell that you cannot devise a general
| function that computes another function that computes the same
| result for the same input (a trivial property) as a third
| function.
|
| Of course an intelligent being, using the approach of informed
| reasoning (trial and error), would be able to do that to a certain
| extent, but an algorithm-driven machine usually cannot. This is
| directly connected to the equally undecidable halting problem
| which states that a machine cannot decide if another machine (or
| itself) holds on a given input (that would be a trivial property,
| too).
|
| That does not apply to obfuscation as a whole (so code *can* be
| made less readable, if that would be desirable to begin with), but
| ISTM it does apply to your approach at minification.

Or do you believe that minimizers can't work?

Cheers,

-- Scott

From: Thomas 'PointedEars' Lahn on 29 Dec 2009 18:25

Scott Sauyet wrote:

> Thomas 'PointedEars' Lahn wrote:
>> Scott Sauyet wrote:
>>> Thomas 'PointedEars' Lahn wrote:
>>> Yes, we all know how to read Wikipedia...
>> Apparently not.
>
> True, it's pretty clear that at least one person in this thread
> doesn't properly understand the implications of Rice's theorem! :-)

I am counting at least two.

>>> You didn't respond to Lasse Nielsen's last post, but, if you're right
>>> that Rice's theorem says that you cannot programatically convert a
>>> function to another of identical behavior, how do explain the
>>> existence of compilers?
>>
>> A compiler transforms source code into machine-dependent code; IOW, it
>> translates one higher-level language into another, lower-level language.
>> That is a concept fundamentally different from that which Rice's theorem
>> addresses.
>
> Right, but the form of obfuscation that Lasse and I are suggesting is
> closer to that than to a transformation at the algorithmic level. [...]

IOW, it was a red herring.

>>> It doesn't imply that we can't transform one squaring function into
>>> another.
>> It does imply that a *program* cannot do that.
>
> Note the word "function". I defy you to find a Javascript squaring
> *function*, f, for which the Y-combinator applied to a function
> returning f will not also yield a squaring function defined on the
> same domain.

You are still missing the point. Finding the squaring function
is what the program needs to do (not me), and which it can not.

> [...] I contend that without analyzing the behavior of 'a' or 'f' above,
> these definitions will have the same formal behavior, for any given
> values of 'b', 'i', and 'x':
>
> var a = (function(f) {
> return (function(g) {
> return g(g);
> })(function(h) {
> return function() {
> return f(h(h)).apply(null, arguments);
> };
> });
> })(function() {
> return function(j) {while (--j) b(j);}
> });
>
>
> var f = (function(f) {
> return (function(g) {
> return g(g);
> })(function(h) {
> return function() {
> return f(h(h)).apply(null, arguments);
> };
> });
> })(function() {
> return function(x) {
> a(i);
> return x * x;
> }
> });

Thank you, I rest my case. The "Y-combined" variant will break in various
implementations that do not provide Function.prototype.apply() to begin
with. IOW, it does _not_ have the same formal behavior.

>>> We're pretty far afield from the OP's goals, though.
>>
>> Obfuscation was part of the goal, and that has been attained to a
>> certain degree. However, it has necessarily _not_ been attained
>> *through a program*, of course, and so missing the other part, of
>> course.
>
> No one has written a program, it's true. But we've described
> algorithms which could be implemented by someone who knows how to
> write a Javascript parser/transformer.

No.

> You're an (arguably :-) ) intelligent person. Can you tell me what's
> wrong with the following argument: [...]

Since you now have begun making up arguments in order to make arguments,
we better end this discussion.

PointedEars
--
Use any version of Microsoft Frontpage to create your site.
(This won't prevent people from viewing your source, but no one
will want to steal it.)
-- from <http://www.vortex-webdesign.com/help/hidesource.htm> (404-comp.)

From: Scott Sauyet on 29 Dec 2009 20:52

On Dec 29, 6:25 pm, Thomas 'PointedEars' Lahn <PointedE...(a)web.de>
wrote:
> Scott Sauyet wrote:
>> Thomas 'PointedEars' Lahn wrote:
>>> Scott Sauyet wrote:
>>>> Thomas 'PointedEars' Lahn wrote:

>>> A compiler transforms source code into machine-dependent code; IOW, it
>>> translates one higher-level language into another, lower-level language..
>>> That is a concept fundamentally different from that which Rice's theorem
>>> addresses.
>
>> Right, but the form of obfuscation that Lasse and I are suggesting is
>> closer to that than to a transformation at the algorithmic level. [...]
>
> IOW, it was a red herring.

No, it was something that could help towards the OP's goals. A
transformation of one program into another with the same behavior but
with syntax harder for the casual reader to understand.

>>>> It doesn't imply that we can't transform one squaring function into
>>>> another.
>>> It does imply that a *program* cannot do that.
>
>> Note the word "function". I defy you to find a Javascript squaring
>> *function*, f, for which the Y-combinator applied to a function
>> returning f will not also yield a squaring function defined on the
>> same domain.
>
> You are still missing the point. Finding the squaring function
> is what the program needs to do (not me), and which it can not.

Did you really not read my last response? This is purely syntactic.

Why should the obfuscator locate the squaring function? It will
transform all functions, or perhaps a random sample of them to make it
more difficult to reverse. It doesn't need to care what the functions
mean.

>> [...] I contend that without analyzing the behavior of 'a' or 'f' above,
>> these definitions will have the same formal behavior, for any given
>> values of 'b', 'i', and 'x':
>
>> var a = (function(f) {
>> return (function(g) {
>> return g(g);
>> })(function(h) {
>> return function() {
>> return f(h(h)).apply(null, arguments);
>> };
>> });
>> })(function() {
>> return function(j) {while (--j) b(j);}
>> });
>
>> var f = (function(f) {
>> return (function(g) {
>> return g(g);
>> })(function(h) {
>> return function() {
>> return f(h(h)).apply(null, arguments);
>> };
>> });
>> })(function() {
>> return function(x) {
>> a(i);
>> return x * x;
>> }
>> });
>
> Thank you, I rest my case. The "Y-combined" variant will break in various
> implementations that do not provide Function.prototype.apply() to begin
> with. IOW, it does _not_ have the same formal behavior.

The prosecution declines to give a closing argument. With the
defense's lack of a reasonable case, there is no need.

Is that really your argument? Really?

So in versions that don't match ES3 15.3.4.3, this technique won't
work? You know what else? It won't work if you translate your code
into Swahili either. So what? And that's not even an essential
portion of the Y-combinator. There are versions around that do not
use Function.prototype.appy. [1] [2]

Still, in the vastly restricted environment of those implementations
that do manage to correctly implement Function.prototype.apply, can
you point to a difference in behavior?

>>> Obfuscation was part of the goal, and that has been attained to a
>>> certain degree. However, it has necessarily _not_ been attained
>>> *through a program*, of course, and so missing the other part, of
>>> course.
>
>> No one has written a program, it's true. But we've described
>> algorithms which could be implemented by someone who knows how to
>> write a Javascript parser/transformer.
>
> No.

Ah, what a witty riposte! What intellectual fortitude it took to
devise such a clever reply! What a fantastic debater my opponent has
shown himself to be!

Can you explain what is wrong with the algorithm I applied? Or are
you still going to claim that I'm cheating by relying on
Function.prototype.apply?

>> You're an (arguably :-) ) intelligent person. Can you tell me what's
>> wrong with the following argument: [...]
>
> Since you now have begun making up arguments in order to make arguments,
> we better end this discussion.

Well, if you wish, of course. But I'm sure everyone reading this
thread recognizes the argument you deleted as a simple variation of
your argument that Rice's theorem implies that you can't create
transformations of source code that retain the formal behavior of the
original code but are reasonably obfuscated. Can you point out how
the arguments differ or confirm that you don't believe minifiers can
work?

-- Scott

[1] http://thraxil.org/users/anders/posts/2008/09/15/My-Own-Javascript-Y-Combinator/
[2] http://matt.might.net/articles/implementation-of-recursive-fixed-point-y-combinator-in-javascript-for-memoization/

From: Ira Baxter on 5 Jan 2010 11:21

"Doug Miller" <spambait(a)milmac.com> wrote in message
news:hgqmtm$d77$2(a)news.eternal-september.org...
> In article <bphrnmob.fsf(a)gmail.com>, Lasse Reichstein Nielsen
> <lrn.unread(a)gmail.com> wrote:
>>spambait(a)milmac.com (Doug Miller) writes:
>>
>>> In article
>> <21855fc0-131e-4af2-ab92-9e9827eb16c3(a)o28g2000yqh.googlegroups.com>,
>> Andrew
>> <anzenews(a)volja.net> wrote:

> Yes, I understand that. What both you and the OP overlook is that the
> issue is
> still the same: the entire source code is still clearly visible, and its
> operation is obvious to anyone who wishes to take the time to explore it.

"Obvious"? "take the time"? You can reduce the problem of code understanding
to the Turing Tarpit, which is in effect provably impossible.
You should read the following paper on how to make extremely
difficult obfuscation:

http://math.auckland.ac.nz/bitstream/2292/4398/1/0674154.pdf

(and even real programs written by people without obfuscation can
be extremely hard to understand, espeically if they are buggy).

Now whether it is worth it to do this for all Javascript codes, or even
some of it, is another matter. People place different values on
their work.

>>The code will obviously be less efficient (it does more work to get the
>>same result), but it's also harder to read an reason about. Not
>>impossible,
>>only harder.

Not impossible, but hard enough so you need an arbitrarily large amount
of theorem proving to analyze it. That might not be impossible in
the abstract, but its damn hard to distinguish the two.

--
Ira Baxter, CTO
www.semanticdesigns.com

From: Andrew on 6 Jan 2010 07:46

On Dec 29 2009, 6:46 pm, Thomas 'PointedEars' Lahn
<PointedE...(a)web.de> wrote:
> Scott Sauyet wrote:
> > You didn't respond to Lasse Nielsen's last post, but, if you're right
> > that Rice's theorem says that you cannot programatically convert a
> > function to another of identical behavior, how do explain the
> > existence of compilers?
>
> A compiler transforms source code into machine-dependent code; IOW, it
> translates one higher-level language into another, lower-level language.
> That is a concept fundamentally different from that which Rice's theorem
> addresses.

Ok, then, let's use JS minimizers and obfuscators as an example
(instead of compilers). They transform source code so that it is
written in another way, yet it performs the same as it did.
Not that they are useful for this case, but they are instant proof
that such transformations are possible - and by automated tools too.

> > The thread has given a pretty strong consensus that those goals are at
> > best very difficult and, even if attainable, are not worth pursuing.

+1. :)

Thank you all for answering, it's been an interesting discussion.
Happy coding!

First | Prev | Next | Last
Pages: 1 2 3 4 5 6 7 8 9 10
Prev: Code Guidelines
Next: JavaScript code mangler / scrambler / ... khm, more than obfuscator... :)