From: Oliver Wong on
Not going to address every brought up in this thread; just a few points
here and there...

"topmind" <topmind(a)technologist.com> wrote in message
news:1137464398.350811.55480(a)g14g2000cwa.googlegroups.com...
> Patrick May wrote:
>> "topmind" <topmind(a)technologist.com> writes:
>> > "Security" is mostly just massive ACL tables.
>>
>> That is profoundly . . . naive. I strongly urge you to read
>> everything you can find by Bruce Schneir, join the cryptography
>> mailing list run by Perry Metzger, and not say another word about
>> security until you understand why your statement is so deeply
>> embarrassing to you. For a quick, very small taste of why ACL tables
>> don't even begin to scratch the surface of the problem, read
>> http://www.isi.edu/gost/brian/security/kerberos.html.
>
> I see no mention of ACL's there. If you have a specific case of ACL's
> crashing and burning, post it here and let's take a look at it. (Note
> that there are a lot of variations of ACL's, so a flaw in one kind is
> not necessarily a general flaw in ACP concepts.)
>

I'm assuming by "ACL", you mean "Access Control Lists". ACL is one way
to solve the problem of "authorization". That is, given that you know this
person "A" belongs to group "B", does that person have the right to access
resource "C"?

The central topic of that page on Kerberos seems to be "authentication";
that is, how to find out that the person speaking to you is person "A" in
the first place. "authentication" and "authorization" are occasionally
confused, but they are well defined, distinct topics in security. As far as
my understand of ACL goes, it does not address authentication at all.

There are other topics in computer security too, that ACL doesn't
address. E.g. public key cryptography, webs of trust, encryptiong, hashing,
non-repuditation, anonymity, pseudo-random number generation, etc. I would
argue that all of these topics are not "sub-topics" of ACL, so it would seem
that there's a lot more to security than just massive ACL tables.


>> > For lack of a better metric, I propose lines of code (LOC) for now
>> > as our metric for complexity.
>>
>> This is yet another suggestion that shows you don't know much
>> about the topic you're discussing. Lines of code is not a good
>> measurement of anything. Do some research.
>
> Present some evidence. It is not my job to make YOUR points for you. If
> your evidence is "over there", then go "over there", grab it, and bring
> it back.

Rather than "I'm right and you're wrong", it might be better to focus on
"This statement is true, and that one is false". I think it's generally
agreed that LOC is a poor metric for software complexity. If Patrick May
wishes to convince topmind of this point, he may cite some documents which
support his claim. If he doesn't particular care to convince topmind, then
he won't bother to post documents. Either way, I think Patrick May is
confident of his claim. Similarly, if topmind wishes to convince Patrick
May, (s)he too could post some documents. Or maybe (s)he won't bother.
topmind sounds pretty confident too.

I suspect most people who are reading this thread are fairly confident
about their opinion on LOC too, but if they are reasonable people, would be
willing to consider evidence which contradicts their opinions. If such
people haven't yet formed an educated opinion, those people may wish to
start their investigations at
http://en.wikipedia.org/wiki/Source_lines_of_code

>> > Can you make an argument for something being lots of LOC without
>> > being "complex" (beyond learning the domain)? If not, then you are
>> > probably stuck using LOC here.
>>
>> Some of the largest programs I've seen, in terms of lines of
>> code, are for generating reports. Getting just the right information
>> from a large data set and laying it out precisely as requested can be
>> time consuming, but it's not particularly challenging.
>
> Well, these are rather non-specific words you are using. Perhaps such
> tasks bore you, but bordem as a measure of complexity is fraught with
> problems.

I don't think Patrick May is proposing using boredom as a measure of
complexity here. Rather, he is implicitly answering your question of "Can
you make an argument for something being lots of LOC without being complex?"
The answer is yes, and the argument is of the form of a counter-example.
Here is something which is lots of LOC, without being complex.

- Oliver


From: Mikito Harakiri on
Oliver Wong wrote:
> Perhaps I completely misunderstood Patrick May, but it seems he is
> talking about the complexity of developing real-world programs, and not of
> arbitrary text strings, which is why I said "I suspect this has very little
> to do with 'Software Complexity' as Patrick May means it."

Complexity notion is elusive. Emphasising "real world" in "I'm
studying complexity of real-world programs (as opposed to complexity of
TM programs?)" brings nothing new to the table.

> Let's assume that it's the binary representation for a program for some
> particular platform, which implies that it describes an actual valid
> program. If so, then we have defined the "language" we are interested in:
> the language is the set of all finite strings which are binary
> representations for a program in the given platform.
>
> Presumably, there exists some binary strings which are NOT legal
> representations of programs, and so binary may not be the optimal (in terms
> of size of strings) representation for this program. Also, the platform
> might have some peculiarities that lead to all legal representations having
> the same prefix (as a header, for example).
>
> We can perform a measure, then, of how much information is present in
> this string for that language, and that measure can be called the complexity
> of the string (and hence the program). Traditionally, though, this measure
> is called the information entropy.

OK. Consider a language {01,0101,010101,01010101,...}
what is the "information entropy" of string 0101 is?

> Let's say you randomly generate the sequence "100101", and then
> immediately afterwards, I look at your sequence, and then generate my own
> ("non-random") sequence: "100101". Does that mean that my sequence is "less
> complex" than the random one? Even though they are exactly identical?

This parragraph doesn't make any sence. Random sequence is defined as
the one that can withstand all possible statistics tests. Random
sequence is considered to be more complex than sequence which is not
random. In fact, random sequence can't be defined with a program which
is shorter than sequence itself. Unfortunately, the definition of
random doesn't apply to finite sequences.

From: topmind on
Oliver Wong wrote:
> Not going to address every brought up in this thread; just a few points
> here and there...
>
> "topmind" <topmind(a)technologist.com> wrote in message
> news:1137464398.350811.55480(a)g14g2000cwa.googlegroups.com...
> > Patrick May wrote:
> >> "topmind" <topmind(a)technologist.com> writes:
> >> > "Security" is mostly just massive ACL tables.
> >>
> >> That is profoundly . . . naive. I strongly urge you to read
> >> everything you can find by Bruce Schneir, join the cryptography
> >> mailing list run by Perry Metzger, and not say another word about
> >> security until you understand why your statement is so deeply
> >> embarrassing to you. For a quick, very small taste of why ACL tables
> >> don't even begin to scratch the surface of the problem, read
> >> http://www.isi.edu/gost/brian/security/kerberos.html.
> >
> > I see no mention of ACL's there. If you have a specific case of ACL's
> > crashing and burning, post it here and let's take a look at it. (Note
> > that there are a lot of variations of ACL's, so a flaw in one kind is
> > not necessarily a general flaw in ACP concepts.)
> >
>
> I'm assuming by "ACL", you mean "Access Control Lists". ACL is one way
> to solve the problem of "authorization". That is, given that you know this
> person "A" belongs to group "B", does that person have the right to access
> resource "C"?
>
> The central topic of that page on Kerberos seems to be "authentication";
> that is, how to find out that the person speaking to you is person "A" in
> the first place. "authentication" and "authorization" are occasionally
> confused, but they are well defined, distinct topics in security. As far as
> my understand of ACL goes, it does not address authentication at all.

That is more of a network issue than an application issue it appears.
(LDAP is a kind of special-purpose network database actually IIRC, I
would note.)

>
> There are other topics in computer security too, that ACL doesn't
> address. E.g. public key cryptography, webs of trust, encryptiong, hashing,
> non-repuditation, anonymity, pseudo-random number generation, etc. I would
> argue that all of these topics are not "sub-topics" of ACL, so it would seem
> that there's a lot more to security than just massive ACL tables.

P. May was not clear. I am generally focusing on applications, not
network implementation. If he wants to argue that RDBMS are no good for
systems software, I probably will not challenge it at this time.
Networks and systems software tend to have to stick to standard
protocols and be optimized for resources. On the other hand, biz apps
live in constantly changing requirements (changing protocols) and
machine resources are generally more plentiful (relative to SS).
Perhaps this is where RDBMS belong: nimble but hardware-hogging.

>
>
> >> > For lack of a better metric, I propose lines of code (LOC) for now
> >> > as our metric for complexity.
> >>
> >> This is yet another suggestion that shows you don't know much
> >> about the topic you're discussing. Lines of code is not a good
> >> measurement of anything. Do some research.
> >
> > Present some evidence. It is not my job to make YOUR points for you. If
> > your evidence is "over there", then go "over there", grab it, and bring
> > it back.
>
> Rather than "I'm right and you're wrong", it might be better to focus on
> "This statement is true, and that one is false". I think it's generally
> agreed that LOC is a poor metric for software complexity. If Patrick May
> wishes to convince topmind of this point, he may cite some documents which
> support his claim. If he doesn't particular care to convince topmind, then
> he won't bother to post documents. Either way, I think Patrick May is
> confident of his claim. Similarly, if topmind wishes to convince Patrick
> May, (s)he too could post some documents. Or maybe (s)he won't bother.
> topmind sounds pretty confident too.


I never claimed that code bulk (such as LOC) is a great metric for
"complexity". I am only saying it is the only one we have to work with
until another better one is suggested.

Note that it is easier to pad code to make it longer if it is being
used as incentive tool, but it is more difficult to make it shorter. As
a writer will tell you, breivity can be tricky. However, sometimes
short code is tough for many to read, and measuring readability is
another black art, putting us back to square zero.


>
> I suspect most people who are reading this thread are fairly confident
> about their opinion on LOC too, but if they are reasonable people, would be
> willing to consider evidence which contradicts their opinions. If such
> people haven't yet formed an educated opinion, those people may wish to
> start their investigations at
> http://en.wikipedia.org/wiki/Source_lines_of_code
>
> >> > Can you make an argument for something being lots of LOC without
> >> > being "complex" (beyond learning the domain)? If not, then you are
> >> > probably stuck using LOC here.
> >>
> >> Some of the largest programs I've seen, in terms of lines of
> >> code, are for generating reports. Getting just the right information
> >> from a large data set and laying it out precisely as requested can be
> >> time consuming, but it's not particularly challenging.
> >
> > Well, these are rather non-specific words you are using. Perhaps such
> > tasks bore you, but bordem as a measure of complexity is fraught with
> > problems.
>
> I don't think Patrick May is proposing using boredom as a measure of
> complexity here. Rather, he is implicitly answering your question of "Can
> you make an argument for something being lots of LOC without being complex?"
> The answer is yes, and the argument is of the form of a counter-example.
> Here is something which is lots of LOC, without being complex.

But unless he can justify his "less complex" claim with something other
than anecdotes, it won't get us anywhere.

I've seen code that I thought was complex until I understood it. Rocket
science code may look simple to a rocket scientist (domain expert), for
example. CRUD apps involve domains that usually require less formal
education to pick up, so perhaps that just creates the illusion of
simplicity.

I'll propose another rough metric of complexity: Automatability. If
something is easy to automate, then it is less complex. Dispite its
appearence, CRUD has resisted automation beyond a certain point. It is
fraught with the "80/20" rule where you need to be able to put
exceptions (variances, not errors) to whatever the framework provides.
For example, maybe 80% of all screen fields map one-to-one to DB
columns. However, the framework must deal with the 20% that don't
directly map. Thus, you cannot directly tie the schema to the screen
without a back-door. Perhaps you can use the schema as a starting
point, but the screen will need custom fiddling in the end.

So, now we have 2 rough candidate metrics for complexity:

* Code bulk (such as Lines-of-Code)
* Automatability


>
> - Oliver


-T-

From: Oliver Wong on

"Mikito Harakiri" <mikharakiri_nospaum(a)yahoo.com> wrote in message
news:1137627620.209931.125540(a)g44g2000cwa.googlegroups.com...
> Oliver Wong wrote:
>> Perhaps I completely misunderstood Patrick May, but it seems he is
>> talking about the complexity of developing real-world programs, and not
>> of
>> arbitrary text strings, which is why I said "I suspect this has very
>> little
>> to do with 'Software Complexity' as Patrick May means it."
>
> Complexity notion is elusive. Emphasising "real world" in "I'm
> studying complexity of real-world programs (as opposed to complexity of
> TM programs?)" brings nothing new to the table.

I'm not sure what you mean by "brings nothing new to the table"; if "the
table" is a metaphor for this discussion, then I am trying to prevent off
topics things from entering the table!

Let's say you take the binary representation of "real-world program A",
and then calculate its Kolmogorov complexity and find out it's 5. Then you
take the binary representation of "real-world program B", calculate its
Kolmogorov complexity and find out it's 7. Does that we needed more skilled
programmers to develop program B than program A? Does it mean that the
source code for program A is easier to understand than that of program B?
What about if one was compiled in such a way that the compiler unrolled
loops and padded variables to align them with highspeed memory access
boundaries, and the other was not?

This is why I say "Kolmogorov complexity" is probably not the same thing
as "Software Complexity" in the sense that it was used earlier in this
thread.

>
>> Let's assume that it's the binary representation for a program for
>> some
>> particular platform, which implies that it describes an actual valid
>> program. If so, then we have defined the "language" we are interested in:
>> the language is the set of all finite strings which are binary
>> representations for a program in the given platform.
>>
>> Presumably, there exists some binary strings which are NOT legal
>> representations of programs, and so binary may not be the optimal (in
>> terms
>> of size of strings) representation for this program. Also, the platform
>> might have some peculiarities that lead to all legal representations
>> having
>> the same prefix (as a header, for example).
>>
>> We can perform a measure, then, of how much information is present in
>> this string for that language, and that measure can be called the
>> complexity
>> of the string (and hence the program). Traditionally, though, this
>> measure
>> is called the information entropy.
>
> OK. Consider a language {01,0101,010101,01010101,...}
> what is the "information entropy" of string 0101 is?

The alphabet of this language is {0,1,e} where e signifies the end of
the string. The entropy of each character in the string is H(x)
= -SUM[i=1..n]{p(i)log2(p(i))}, where i represents each possible character
that may appear next (n is the number of possible characters), and p(i) is
the probability of i occuring next.

Since we have 3 characters in this alphabet, n = 3, and the summation
will always be of the form:

p('0')log2(p('0'))
+ p('1')log2(p('1'))
+ p('e')log2(p('e'))

The first character in this language is always '0', so we have p('0') = 1,
p('1') = 0 and p('e') = 0. The summation gives us a value of 0, and we take
its negative, which is still 0. So the first character has zero
informational entropy. That is, whether or not we see that character, we
still have the same amount of information. In other words, we knew we were
going to see a '0' anyway.

Similarly, the second character is always '1', and so there's still zero
information there.

The third character could either be '0' or 'e'. I don't know what the
probability distribution is, so let's just assume uniform distribution (it
makes the math easier). The summation is:

p('0')log2(p('0'))
+ p('1')log2(p('1'))
+ p('e')log2(p('e'))

=

0.5 * -1
+ 0
+ 0.5 * -1

=

-1

And we take the negation, to give us 1. So the next character, whatever it
is, will provide us with 1 bit of information. Let's say it was '0'. Then we
know the next character must be '1'. After that, it might be '0' or 'e'
again, and provides us with 1 bit of information again.

So I'd say the overall informational entropy of your string is 2 bits.

>
>> Let's say you randomly generate the sequence "100101", and then
>> immediately afterwards, I look at your sequence, and then generate my own
>> ("non-random") sequence: "100101". Does that mean that my sequence is
>> "less
>> complex" than the random one? Even though they are exactly identical?
>
> This parragraph doesn't make any sence. Random sequence is defined as
> the one that can withstand all possible statistics tests.

I haven't heard of this definition before. What does it mean for a
sequence to "withstand" a test? Is a test a boolean function, and returning
'false' implies not-withstanding?

> Random
> sequence is considered to be more complex than sequence which is not
> random. In fact, random sequence can't be defined with a program which
> is shorter than sequence itself. Unfortunately, the definition of
> random doesn't apply to finite sequences.

The definition of a random sequence not being definable with a program
shorter than the sequence itself refers to a Chaitin-Kolmogorov randomness,
and it is not the same as a statistical randomness, so you probably
shouldn't be mixing the two in the above paragraph like that.

The "above paragraph" which did not make sense to you was meant to show
you that I do not understand what you mean by "it is more challenging to
generate a random sequence than a non-random sequence". I still do not know
what you mean by "more challenging". How do you measure challenge levels?

- Oliver


From: frebe on
H. S. Lahman wrote:
> >>SQL represents a
> >>solution to persistence access that is designed around a particular
> >>model of persistence itself.
> > Since when is the relational model a "model of persistence". Can you
> > provide any pointer showing that the relational model is supposed to be
> > a "persistence model".
> I didn't say that. The RDM is a model of static data so it can be
> applied to UML Class Diagrams as well. Note that I was careful to say
> that SQL is a solution to persistence /access/ when the data is
> represented in RDB form. As you have pointed out elsewhere one could
> create a RAM-based RDB and use SQL to access it with no persistence.

You agree that the relational model is not only about persistence? But
why is the SQL language limited to only the persistence features of the
relational model?

> >>Try using SQL vs. flat files if you think it is independent of the
> >>actual storage mechanism. (Actually, you probably could if the flat
> >>files happened to be normalized to the RDM, but the SQL engine would be
> >>a doozy and would have to be tailored locally to the files.) SQL
> >>implements the RDB view of persistence and only the RDB view.
> > Yes, the files need to be normalized to the RDM but why do you make the
> > conclusion that SQL needs a RDB?
> SQL requires the data to be in tables and tuples with embedded identity.

SQL does not requires the data to be in tables. The data may reside in
flat files or RAM structures. Just because the SQL language uses the
keyword "table" does not actually mean that it need to be backed up by
a physical table. SQL is an interface, remember?

> The RDM, when applied in a broader context than Codd, does not require
> that. SQL also assumes a very specific paradigm for navigating table
> relationships.

I think you are trying to hijack relational theory here. Do you have
any pointers to this second definition of relational theory?

Fredrik Bertilsson
http://butler.sourceforge.net