From: Aaron Bertrand on
Can you show your query or give more details? It's hard to tell you how to
solve your problem when we don't understand the problem.

Also for things like this when I need to retrieve a subset but maintain
their properties from a whole, it is relatively simple:

;WITH x AS
(
SELECT
foo, bar, rn = ROW_NUMBER() OVER (...)
FROM BigTable
)
SELECT
foo, bar, rn
FROM x
WHERE <criteria>;

Yes, the CTE still has to do a lot of heavy lifting, but you don't have to
pull each row into your .Net code and iterate one by one. Not sure why that
takes 12 seconds per iteration anyway, my guess is your code isn't exactly
optimized. But maybe it will be a bit faster if you employ a technique like
above.






On 10/13/09 6:39 PM, in article j40ad5d8h1b6oadne15ptkbu5d557j1jkp(a)4ax.com,
"Jeff(a)google.com" <Jeff(a)google.com> wrote:

> Can't. I am using The ROW_NUMBER() , RANK() and DENSE_RANK (0
> function. I lose that information if I query for a Speecfic item.
> THatr why I have been doing the iteration but they take a while since
> I am running many simulations
>
>
> A difficult problem
>
> On Tue, 13 Oct 2009 12:19:22 +0100, "Paul"
> <paulriley(a)novareconsulting.com> wrote:
>
>> Insert into a temporary table with the addition of an identity column. Then
>> use this table as the basis for your query the identity columnh will
>> identify record/row position.
>>
>> There are numerous examples of doing this on the web for Paging large
>> amounts of data.
>>
>>
>>
>> <Jeff(a)google.com> wrote in message
>> news:5d05d5dh7mehsgn45l8af672vbeufienlf(a)4ax.com...
>>> I have a rather large database that I am datamining. I am doing
>>> numerous SQL sorts to determine which methodolgy works best. I would
>>> like to determine the new "record position" of a particular record in
>>> the group. Currently I am doing it programtically by itering through
>>> the records till I find the particular record. This can be time
>>> consuming given the size of the database
>>>
>>> Is there a better method?
>>>
>>
>

From: Paul on
I suggest you get your ACME rocket and go back to the drawing board or at
least explain CLEARLY what you are trying to do. Preferably with a code
sample.


From: m on
if the simulations are independent, then multiple threads / processes /
servers are your best approach. This is not an SQL question because the
time taken is not taken by the RDBMS nor the data access. The optimization
that you need is to your simulation algorithm rather ;)

<Jeff(a)google.com> wrote in message
news:sgebd5tp3olip31fmom6hsf1ipic1onvfe(a)4ax.com...
> It is not the database size (half a million) but the number of
> simulations (650,000). Using the current SQL commands (run in Vb.net)
> takes about 12 seconds per simulation. By my estimates it will take
> over 90 days to run the simulation so I am trying to optimize the
> code. Once the data is ORDERed by multiple criteria I need to find
> the row_number value for about 100 records. The only way I have been
> doing it is with a ExecuteReader and iterating through the list. This
> take up a lot of time. If I use a WHERE command on the list I lose
> the ROW_NUMBER values.
>
> Any thoughts?
>
>
>
>
> On Tue, 13 Oct 2009 16:57:40 -0700 (PDT), --CELKO--
> <jcelko212(a)earthlink.net> wrote:
>
>>>> I have a rather large database that I am data mining. <<
>>
>>How many petabytes? SQL Server is not in that game; you ought be
>>using Teradata, KX, WX2, etc.
>>
>>>> I am doing numerous SQL sorts to determine which methodology works
>>>> best.<<
>>
>>methodology is the study of methods. so this sentence is strange at
>>best. Sorting is a file system concept; SQL is based on sets, which
>>have no ordering.
>>
>>>> I would like to determine the new "record position" of a particular
>>>> record [sic] in the group. <<
>>
>>THE group? What does that mean? Since SQL uses rows and not records
>>and has no concept of a position, this is getting worse.
>>
>>>> Currently I am doing it programmatic by iterating [no such concept in
>>>> declarative languages] through the records [sic] till I find the
>>>> particular record [sic] . <
>>
>> Records and iteration are procedural language and file system
>>concepts; they are not part of SQL. We use a KEY to locate a ROW.
>>
>>There are ordinal functions (ROW_NUMBER(), RANK() and DENSE_RANK())
>>that sort of like what you might be trying to ask. They are based on
>>von Neuman's definition of ordinal numbers.
>
From: --CELKO-- on
>> Thankfully at the moment there are only a fraction of a percent of companies requiring data mining on Petabytes of data - if you are doing any real work in the real world you would know that. <<

Not true. Terabytes is common now. A few years ago, that was a huge
number. I have a several companies that will hit Petabytes in in the
next 5 years. Unlike you, I want to see more data in more places so
better decisions can be made.

Since my first Masters was in Math and I was a statistician for years,
I go to the right tools for serious data mining -- SAS, SPSS,
KnowledgeSeeker, etc. and my clients are using larger databases. SQL
Server is pretty much stuck at descriptive stats while my people want
predictive stats. This is not real data mining. Can you say
"Bonferroni correlation trees" and have any idea what it means and how
to use it? How many time series methods do you know and when to use
each of them?

From: Tony Rogerson on
>> But do tell us; as you are giving opinion on SQL Server's data mining
>> abilities can you --celko-- tell us what real world experience you have
>> of
>> that feature of the product - for instance, have you even installed it to
>> review it or like usual are you just going on your double decade old
>> opinion
>> on SQL Server?

Perhaps you could detail the areas of data mining with SQL Server you have
looked at.

You've not even installed it have you!

And to answer your question - its like I said - there is a fraction of a
percent of companies out there that are dealing in Terabytes - and that is
true; its not a guess from a four walled classroom - its a reality - I am in
the industry full time as a consultant - I don't just work the odd day a
couple of times a month like yourself.

You are well out of date and full of BS.

And no, I do want business to make full use of its data - its part of my
role when I consult; but unlike you I am not going to lie about the
realities of the industry - Petabytes is NOT common place.

--ROGGIE--


"--CELKO--" <jcelko212(a)earthlink.net> wrote in message
news:6ce1315a-6095-4ba3-87ef-61bb4b37f841(a)l2g2000yqd.googlegroups.com...
>>> Thankfully at the moment there are only a fraction of a percent of
>>> companies requiring data mining on Petabytes of data - if you are doing
>>> any real work in the real world you would know that. <<
>
> Not true. Terabytes is common now. A few years ago, that was a huge
> number. I have a several companies that will hit Petabytes in in the
> next 5 years. Unlike you, I want to see more data in more places so
> better decisions can be made.
>
> Since my first Masters was in Math and I was a statistician for years,
> I go to the right tools for serious data mining -- SAS, SPSS,
> KnowledgeSeeker, etc. and my clients are using larger databases. SQL
> Server is pretty much stuck at descriptive stats while my people want
> predictive stats. This is not real data mining. Can you say
> "Bonferroni correlation trees" and have any idea what it means and how
> to use it? How many time series methods do you know and when to use
> each of them?
>
First  |  Prev  |  Next  |  Last
Pages: 1 2 3 4 5
Prev: Syntax question in stored procedure
Next: freetext problem