From: topmind on

Dmitry A. Kazakov wrote:
> On 6 Feb 2006 19:32:11 -0800, topmind wrote:
>
> > Dmitry A. Kazakov wrote:
>

> >> Huh, the code you presented is 1) longer, 2) far more confused.
> >
> > I don't have to define and hunt down definitions of types. You will
> > have something like:
> >
> > a = new integer(min=-223452345, max=234234234);
> > b = new integer(min=-09809809809, max=242093423423);
> > x = a.add(b);
>
> type I is new Integer; -- Compare with CREATE TABLE statement
> A, B, X : Integer;
> X := A + B;

The context was "configurable" integers. You are using the
out-of-the-box 8-byte version here, not configurable ones.

And, I never was a fond of SQL "Create" command syntax anyhow. SQL is
NOT the pinnacle of relational languages by any stretch.

>
> >> As I have shown, it is not type free. It is typed in a definite way.
> >
> > I suspect we would need a clear definition of "types" to settle this.
> > Either way, it is not relying on "types" in the usual strong-typed way.
> > By the way, many scripting languages make no internal distinction
> > between:
> >
> > x = 123
> >
> > and
> >
> > x = "123"
>
> and "1 23"? Type is a set of values and operations. It is irrelevant how
> you denote values!

Thats my point.

> >
> > Rounding errors for "add"?
>
> Add 1.0 to 1000000000.0 in IEEE 32-bit and print the result. That should be
> the very first thing students should do in classes. This is why different
> numeric models exist. There is a trade-off space, performance, accuracy,
> set of closed operations. It isn't rocket science...

Floating point tends to stink for biz apps, where decimal arithmetic is
preferred over converted binary. But, that is another topic. (Although
I've nevered encountered floating point problems during adds IIRC.)

>
> >> Fine, show us an outline of a better language limited to solely relations
> >> and we'll see.
> >
> > Hold on, are we talking about implementation here, or the language? I
> > don't see how the language relates to compactly representing pixels
> > internally.
>
> Of course it relates. I want to access each pixel relationally, using
> SELECT. How would you implement, say, motion detection in video images if
> you cannot access pixels? You say you are thinking in tables. Please, do it
> this way!


I would have to think a while about how to implement motion detection
algorithms mostly with relational operations. I'm rusty on such
problems even using sequential logic, not having worked on those since
waaaay back in my college days.

Why are you selecting "science lab" kinds of problems anyhow? Maybe
relational does suck with science-lab kinds of problems. It is not my
domain and I don't really care. It is one domain out of many.


> > Please clarify. Relational does not limit auto-generated keys. Some say
> > it "encourages" one to use domain keys, but this has never been
> > settled, and some argue that auto-gen keys are or can be domain keys.
>
> Auto-generated keys defeat the model. It is a work-around.

Hogwash.

> Is uniqueness a
> property of a table, row, cell, value, DB, Universe?

If it is needed by the domain, then relational does not care. As long
as a table defines a unique key, relational is happy. Whether that key
is an auto-gen number or the square of the pimple count on your pet
girbles, it ain't care.

> Can I copy it? Is the
> copy unique again? In which scope? Can I add such keys? Sort them? These
> questions have many contradicting answers. Sometimes I need one answer,
> sometimes other. ADT solves that by having clear contracts of the objects
> in use.

Show me! OO has a horrible time with usable identity, with no built-in
referential tools whatsoever, leading to "creative" solutions.

>
> >> 2. Tables of tables.
> >

> > This is a *good* restriction of the relational model. Hard-nesting is
> > hard-wiring a particular viewpoint or (access path) into the model,
> > which goes against the relativistic philosophy of relational. Dr. Codd
> > set out to purposely avoid hard-wired access paths when he started
> > thinking about relational due to the navigational messes that others
> > were creating. However, it does not limit what can be modelled
> > externaly. One can still model nested stuff using non-nested tables.
> >
> > It is not a technical "limit", but a philosophical guide-wall.
>
> Yes. It means that your model isn't complete.

"Complete" for what?

> Note also relations, as found
> in mathematics, do not impose any such limitations. I can have a set of
> sets, a set of set of sets, etc and define relations on them.

So? You can also nest a mess inside a mess inside a mess inside a mess.
However, please keep it out of our world.

>
> > Relational imposes rules to be called "relational". Otherwise, it would
> > turn into the navigational messes that motivated the creation of
> > relational to begin with.
>
> So data in RDBs cannot be properly structured. Fortunately there still
> exist hierarchical DBs!

"Properly"? As defined by what?

>
> This is not how complex software can be built. We are in XXI century, you
> know.

You are in the navigational 60's.


> No, we wish to model them as relations! Polygon isn't a relation. You can
> have a table of rows representing polygons, that's OK. Now, write the
> SELECT statement that gives me the car position, movement direction and
> distance to the next turn I have to do. Show, how this problem can be
> decomposed using relational approach.

Use the distance formula to find the nearest matches. The rest is left
as a reader excercise.

>
> > And, I don't know where you are getting your size estimates.
>
> From GPS resolution 30cm. Non-relational approach can reduce the amount of
> data needed for search, use k-trees, for example. But you have to stick to
> X=a Y=b, which is absolutely unrealistic.

Relational does not dictate implementation. Yes, it means you may have
to have custom programmed RDBMS engines to get the most compact
representation possible, but that will be the case no matter what you
use.

>
> > I don't know. I do notice that academics tend to be poorly trained in
> > RDBMS.
>
> Hmm, AFAIK, academics aren't trained, they train others! (:-))

Only in their field.

> > Further, if there are specialized engines already built to process
> > things such as Bayesean networks or neural nets, obviously it makes
> > sense to go with those specific already-built solutions. RDBMS shine
> > where you have *different uses* for the *same* info. Those things you
> > list tend to be *same* uses for the same info. See the difference?
>
> Yes. In short RDBMS is not a paradigm. End of story.

What precision math are you using to recon that?

>

> No, will not, because you already have conceded that whatever level it
> could be, it is unsuitable outside some niche applications.

*Your* examples are the niche. You are the one standing inside a small
circle, not me. You must have been in school too long or something,
thinking science-lab textbook projects reflect the quantity and style
of real world problems.


> > I said compiler/interpreter, not the CPU. However, CPU is a similar
> > example: the machine code is just data to it.
>
> Yep, and I don't care about machine code.

This is about analogies showing relativism, not what you deam
important.

>
> >>> A developer may think of a function as
> >>> "behavior", but the interpreter treats it more like data if we look at
> >>> other processes that read what we normally call "data".
> >>
> >> That the point, software is developed, maintained and finally scraped by
> >> humans.
> >
> > I don't see how this relates to the relativistic viewpoint of behavior
> > and data.
>
> You need a paradigm conformant to this relativism [=data abstraction]. ADT
> is the vehicle for this. Either with OO or with pure relational (so that
> ADTs are limited to cells) is no matter. The latter is just much weaker.
> What you propose is outside.

So you claim.

>
> --
> Regards,
> Dmitry A. Kazakov

-T-

From: Dmitry A. Kazakov on
On 7 Feb 2006 13:14:42 -0800, Mikito Harakiri wrote:

> Dmitry A. Kazakov wrote:
>>> No, induction in general is much more complex concept than
>>> nearest-neighbour search. As you see the induction relationally is just
>>> a form of outer join:
>>
>> You probably mean inference. No, inference is not learning.
>
> I have meant induction:
> http://en.wikipedia.org/wiki/Induction_(philosophy)
>
>> You need some
>> additional knowledge beyond the training set to learn. This knowledge can
>> be formalized as a metric distance in the feature space.
>
> Induction can be as simple as Lagrange polynomial interpolation over N
> graph points. I fail to follow your idea that "knowledge can be
> formalized as a metric distance in the feature space",

The knowledge formalized is: "the function being interpolated is known to
be a polynome of Nth order." You have to know this before you start. It
cannot be induced from points. All methods are based on similar
assumptions. This in the end determines their applicability in each
concrete case.

> and am very
> sceptical about naive methods of separating points in hyperspace with a
> hyperplane which are so common in AI area.

See above. There are at least two premises determining applicability of
such methods:

1. It is known that classes are separable (don't overlap)
2. It is known that classes are linearly separable in the feature space

These are very strong premises, and no, it is very uncommon in AI to hold
them for true.

> Same for distance and
> metrics based methods. They are too unsophisticated to produce anything
> impressive.

Yet, even such "unsophisticated" things cannot be implemented in SQL!

>> So it goes as
>> follows (for example): knowledge = "let features be independent random
>> variables distributed normally and classes don't intersect", build a
>> classifier minimizing the probability of error.
>
> Once again with "probability" concept not firmly established,

Come on, see Kolmogorov, Lebesgue, measure theory.

> this is just yet another ad-hock "machine learning" method.

All without exception machine learning methods are ad-hoc. So what?

>> But if you want to do inference, I wont object. I'm dying to see a
>> relational theorem prover...
>
> No, we don't discuss inference here. Speaking of inference, RDBMS is
> already an inference engine, admittedly with quite limited
> capabilities. Deductive and constraint databases are perceived as a
> next big step, but (as it is common in programming world) promises are
> short on delivery.
>
>>> Of course, the code has to be in the relational engine --
>>> either natively, or through relational extensions.
>>
>> That is the whole point. Why
>>
>> SELECT Class FROM Training_Set WHERE Distance (X,Y) < Threshold
>>
>> works only on paper?
>
> The distance query is not the method I advocated.

Distance here is only to express things relationally. My intent was to help
topmind to understand the problem as a relational one. It was a bait,
because otherwise, he would say - no, it is computations, we in biz are
computing nothing. [For "computing" substitute anything beyond trivial
SELECT's] Sure, we could take a less relational-friendly formulation, if
you want.

Anyway, you and topmind are free to take any method or propose a new one.
Once you will be ready, test it.
http://www.ics.uci.edu/~mlearn/MLRepository.html has lots of training sets.
Then we'll be able to discuss the code.

>> Why is it impossible even to write one statement
>> working for any set of features (X and Y are tuples (X1, X2, ...))
>
> You have to be more specific here, for me to follow. Is there a certain
> query expressiveness restriction that you indicated?

X is a vector in the feature space. In SQL terms, it a set of columns. But
in SQL a "set of columns" is not a term. So I cannot express it, I have to
change the code time and again:

WHERE (X1 - Y1)**2 + (X2 - Y2)**2 + (X3 - Y3)**2 < Threshold

WHERE 0.2*abs(X.Weight - Y.Weight) + abs(X.Height - Y.Height) < Threshold

etc, because SQL has no proper tools to factor this out.

> If you refer to a quite modest success of RDBMS in the spatial/temporal
> area, then you are right. The list of spatial operators is short of
> being succinct, and the implementation is far from great. The 13(!)
> operators for interval datatype imply that the interval ADT is simply a
> wrong idea.

It does not imply that. I recommend you to read some literature about
interval computations. A good introduction is "A lucid interval" by Brian
Hayes: http://www.americanscientist.org/template/AssetDetail/assetid/28331
In particular, interval comparisons are difficult for Boolean logic. The
issue is discussed in the paper.

> select * from Intervals a, Intervals b
> where overlaps(a,b)

SELECT ... WHERE a AND b -- With tri-state logic

But for this, there must be ADTs for Boolean, Belnap_Logical, Fuzzy_Logical
etc. All things SQL does not have. [ BTW, I saw Postgres extensions for
intuitionistic logic. ]

[ Last but not least, to your knowledge, floating-point numbers *are*
intervals. Open any book on numeric methods and read there: never ever
compare floating-point numbers for equality. Then read "A lucid interval"
again. ]

> Yet, even with all the drawbacks, querying is far superior abstraction
> to OOP. It isolates a particular implementation from the user and,
> unlike OOP, tries hard not to introduce unnecessary and ugly artifacts.

This need to be shown. So far, I saw nothing working. Moreover, there was
no way shown, indicating that the problems I mentioned could be decomposed
in a reasonable manner into relational tables.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
From: Dmitry A. Kazakov on
On 7 Feb 2006 18:04:08 -0800, topmind wrote:

> Dmitry A. Kazakov wrote:
>> On 6 Feb 2006 19:32:11 -0800, topmind wrote:
>>
>>> Dmitry A. Kazakov wrote:
>>
>>>> Huh, the code you presented is 1) longer, 2) far more confused.
>>>
>>> I don't have to define and hunt down definitions of types. You will
>>> have something like:
>>>
>>> a = new integer(min=-223452345, max=234234234);
>>> b = new integer(min=-09809809809, max=242093423423);
>>> x = a.add(b);
>>
>> type I is new Integer; -- Compare with CREATE TABLE statement
>> A, B, X : Integer;
>> X := A + B;
>
> The context was "configurable" integers. You are using the
> out-of-the-box 8-byte version here, not configurable ones.

Replace the first line with:

type I is range -223452345..234234234;

Now show me a relational equivalent! You need a table of types. In that
table you need columns for minimal and maximal values, set to NULL when not
applicable. You need a column describing the class of the type [numberic,
string etc] and the operations defined on it. A type will be created by an
INSERT statement etc. Cool! (:-))

>>> Rounding errors for "add"?
>>
>> Add 1.0 to 1000000000.0 in IEEE 32-bit and print the result. That should be
>> the very first thing students should do in classes. This is why different
>> numeric models exist. There is a trade-off space, performance, accuracy,
>> set of closed operations. It isn't rocket science...
>
> Floating point tends to stink for biz apps,

Rubbish. What stinks is lack of understanding differences between numeric
models and missing contracts specifying the *semantics* of numeric
operations.

> where decimal arithmetic is
> preferred over converted binary. But, that is another topic. (Although
> I've nevered encountered floating point problems during adds IIRC.)

Calculate an average of a column containing pair millions numbers like 1.0
and 1000000000.0. Then ask yourself how accurate the result is.

> Why are you selecting "science lab" kinds of problems anyhow?

1. There is no such division. When a customer comes to us, he asks for a
complete solution of his problem. That includes data acquisition, data
archiving, human-machine interface, and a bit of "science lab."

2. It is a paradigm in question. A paradigm should be capable to decompose
any problem.

>> Can I copy it? Is the
>> copy unique again? In which scope? Can I add such keys? Sort them? These
>> questions have many contradicting answers. Sometimes I need one answer,
>> sometimes other. ADT solves that by having clear contracts of the objects
>> in use.
>
> Show me! OO has a horrible time with usable identity, with no built-in
> referential tools whatsoever, leading to "creative" solutions.

OO gives you an opportunity to analyse the problem. It does not warranty,
that you'll find a solution. The point I'm trying to make that you cannot
freeze types, it must be an open-end types system. Your position is that
openness stops with data. This is too limiting.

>>> This is a *good* restriction of the relational model. Hard-nesting is
>>> hard-wiring a particular viewpoint or (access path) into the model,
>>> which goes against the relativistic philosophy of relational. Dr. Codd
>>> set out to purposely avoid hard-wired access paths when he started
>>> thinking about relational due to the navigational messes that others
>>> were creating. However, it does not limit what can be modelled
>>> externaly. One can still model nested stuff using non-nested tables.
>>>
>>> It is not a technical "limit", but a philosophical guide-wall.
>>
>> Yes. It means that your model isn't complete.
>
> "Complete" for what?

For efficient software reuse.

>> Note also relations, as found
>> in mathematics, do not impose any such limitations. I can have a set of
>> sets, a set of set of sets, etc and define relations on them.
>
> So? You can also nest a mess inside a mess inside a mess inside a mess.
> However, please keep it out of our world.

Then, please, don't argue to the authority of mathematics. It isn't on your
side.

>>> Relational imposes rules to be called "relational". Otherwise, it would
>>> turn into the navigational messes that motivated the creation of
>>> relational to begin with.
>>
>> So data in RDBs cannot be properly structured. Fortunately there still
>> exist hierarchical DBs!
>
> "Properly"? As defined by what?

By the application domain. Relational approach forces quite artificial
constructions.

>> No, we wish to model them as relations! Polygon isn't a relation. You can
>> have a table of rows representing polygons, that's OK. Now, write the
>> SELECT statement that gives me the car position, movement direction and
>> distance to the next turn I have to do. Show, how this problem can be
>> decomposed using relational approach.
>
> Use the distance formula to find the nearest matches. The rest is left
> as a reader excercise.

No that will extract the whole table and sort it by the distance! Many
thanks!

>>> And, I don't know where you are getting your size estimates.
>>
>> From GPS resolution 30cm. Non-relational approach can reduce the amount of
>> data needed for search, use k-trees, for example. But you have to stick to
>> X=a Y=b, which is absolutely unrealistic.
>
> Relational does not dictate implementation. Yes, it means you may have
> to have custom programmed RDBMS engines to get the most compact
> representation possible, but that will be the case no matter what you
> use.

Stop here. "Custom" in "custom programmed RDBMS engine" means
non-relationally programmed. Am I right?

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
From: topmind on

Dmitry A. Kazakov wrote:
> On 7 Feb 2006 18:04:08 -0800, topmind wrote:
>
> > Dmitry A. Kazakov wrote:
> >> On 6 Feb 2006 19:32:11 -0800, topmind wrote:

> > The context was "configurable" integers. You are using the
> > out-of-the-box 8-byte version here, not configurable ones.
>
> Replace the first line with:
>
> type I is range -223452345..234234234;

That is a language-specific feature.

>
> Now show me a relational equivalent!

"Cell types" are generally orthogonal to the relational model.
Relational only cares that the "expression engine" follow a minimum set
of rules, as already described. Thus, your question is outside of
relational.

>You need a table of types. In that
> table you need columns for minimal and maximal values, set to NULL when not
> applicable. You need a column describing the class of the type [numberic,
> string etc] and the operations defined on it. A type will be created by an
> INSERT statement etc. Cool! (:-))

Sounds like a Data Dictionary. A nice feature, but not necessary for
"relational", as described above.

>
> >>> Rounding errors for "add"?
> >>
> >> Add 1.0 to 1000000000.0 in IEEE 32-bit and print the result. That should be
> >> the very first thing students should do in classes. This is why different
> >> numeric models exist. There is a trade-off space, performance, accuracy,
> >> set of closed operations. It isn't rocket science...

That is only for pretty large numbers. Usually a dedicated decimal
library is used or they only allow COBOL for such calculations, because
it has "money-friendly" constructs already built in. (Most big-money
financial software is still COBOL, for good or bad.)

Further, most commercial RDBMS have a "money" or "decimal" type such
that one does not have to use floating.

> >
> > Floating point tends to stink for biz apps,
>
> Rubbish. What stinks is lack of understanding differences between numeric
> models and missing contracts specifying the *semantics* of numeric
> operations.

How would putting a wrapper around them fix the above problem you
mention? At best it could only report that the inputs are too large to
guarentee the right answer. If you go beyond the size of Double, then
you have gone beyond. That's that. Either you use a different number
model or you are hosed.

>
> > Why are you selecting "science lab" kinds of problems anyhow?
>
> 1. There is no such division. When a customer comes to us, he asks for a
> complete solution of his problem. That includes data acquisition, data
> archiving, human-machine interface, and a bit of "science lab."

Here is a ROUGH general breakdown of domains as I perceive the IT
world:

30% custom biz apps (90% RDBMS included/used)
15% packaged biz apps (70% RDBMS included/used)
15% embedded apps (less than 10% RDBMS)
10% factory/process control (40% RDBMS included/used)
10% scientific/math apps (25% RDBMS included/used)
10% entertainment apps (15% RDMBS included/used)
10% other (est. 30% RDBMS included/used)

>
> 2. It is a paradigm in question. A paradigm should be capable to decompose
> any problem.


I disagree with such a definition! I buy into Yin-Yang *complimentary*
use of *multiple* tools. One-size-fits-all is old-style and multi-tool
usage is on the increase.


>
> >> Can I copy it? Is the
> >> copy unique again? In which scope? Can I add such keys? Sort them? These
> >> questions have many contradicting answers. Sometimes I need one answer,
> >> sometimes other. ADT solves that by having clear contracts of the objects
> >> in use.
> >
> > Show me! OO has a horrible time with usable identity, with no built-in
> > referential tools whatsoever, leading to "creative" solutions.
>
> OO gives you an opportunity to analyse the problem. It does not warranty,
> that you'll find a solution. The point I'm trying to make that you cannot
> freeze types, it must be an open-end types system. Your position is that
> openness stops with data. This is too limiting.

Show me relational being limiting to my domain.

> >> Yes. It means that your model isn't complete.
> >
> > "Complete" for what?
>
> For efficient software reuse.

Reuse? This is a new bash against relational (IIRC). Example?

>
> >> Note also relations, as found
> >> in mathematics, do not impose any such limitations. I can have a set of
> >> sets, a set of set of sets, etc and define relations on them.
> >
> > So? You can also nest a mess inside a mess inside a mess inside a mess.
> > However, please keep it out of our world.
>
> Then, please, don't argue to the authority of mathematics. It isn't on your
> side.

I have surrenderred math-intensive apps multiple times due to my lack
of experience there. I am running out of white flags for that niche.

>
> >>> Relational imposes rules to be called "relational". Otherwise, it would
> >>> turn into the navigational messes that motivated the creation of
> >>> relational to begin with.
> >>
> >> So data in RDBs cannot be properly structured. Fortunately there still
> >> exist hierarchical DBs!
> >
> > "Properly"? As defined by what?
>
> By the application domain. Relational approach forces quite artificial
> constructions.

I don't see them. In fact, relational is pretty good where the domain
*is* artificial, such as dealing with intellectual ideas such as
invoices, intellectual property, marketing rules, etc. Modelling the
Cartesion coordinate (X,Y,Z) physical world is perhaps where it does
not do so well.

>
> >> No, we wish to model them as relations! Polygon isn't a relation. You can
> >> have a table of rows representing polygons, that's OK. Now, write the
> >> SELECT statement that gives me the car position, movement direction and
> >> distance to the next turn I have to do. Show, how this problem can be
> >> decomposed using relational approach.
> >
> > Use the distance formula to find the nearest matches. The rest is left
> > as a reader excercise.
>
> No that will extract the whole table and sort it by the distance! Many
> thanks!

...And Distance < myThreashold

Most SQL dielects also let one limit the number of rows returned. (Some
argue that is not part of relational, but it can be considered a common
database service. A.C.I.D. Transactions and sorting are also not part
of relational, but are a common service supplied by DBs, even
non-relational ones.)

> > [pixel example]

> > Relational does not dictate implementation. Yes, it means you may have
> > to have custom programmed RDBMS engines to get the most compact
> > representation possible, but that will be the case no matter what you
> > use.
>
> Stop here. "Custom" in "custom programmed RDBMS engine" means
> non-relationally programmed. Am I right?

Heck no! A custom made/tuned relational engine is certainly as possible
as making a non-relational one. A non-trivial commercial graphics
engine will almost certain be custom-built for graphics regardless of
paradigm used.

>
> --
> Regards,
> Dmitry A. Kazakov

-T-

From: topmind on

Christian Brunschen wrote:
> In article <1139283131.650330.173930(a)o13g2000cwo.googlegroups.com>,
> topmind <topmind(a)technologist.com> wrote:
> >Dmitry A. Kazakov wrote:
>
> [ much deletia ]
>
> >>
> >> SQL is typed, so I see no point.
> >
> >Perhaps, but it is possible to create a nearly type-free version of
> >SQL.
> >
> >SQL-Lite (sqlite.org) is allegedly one such tool, although I've never
> >tried it.
>
> SQlite is not in fact untyped:
>
> <quote src="http://sqlite.org/different.html">
>
> Manifest typing
>
> Most SQL database engines use static typing. A datatype is associated with
> each column in a table and only values of that particular datatype are
> allowed to be stored in that column. SQLite relaxes this restriction by
> using manifest typing. In manifest typing, the datatype is a property of
> the value itself, not of the column in which the value is stored. SQLite
> thus allows the user to store any value of any datatype into any column
> regardless of the declared type of that column. (There are some exceptions
> to this rule: An INTEGER PRIMARY KEY column may only store integers. And
> SQLite attempts to coerce values into the declared datatype of the column
> when it can.)
>
> </quote>
>
> So SQLite certainly is typed, it's just not completely statically typed.

Perhaps we are using different definitions of "type". I consider a
"type" a side/hidden flag that accompanies the actual value. (Compiled
languages often pre-associate these "flags" with memory or symbolic
slots such that it does not have to carry them at run-time.) I have not
verified that SqlLite lacks a side flag, but it does not sound like it
does.

> The author also notes that this is a difference from standard SQL.

Standard SQL is "typed" because the schema tables/list carries the
"flags" per column.

>
> [ more deletia ]
>
> >>-T-
>
> Best wishes,
>
> // Christian Brunschen

-T-