From: nflacco on
I'm tinkering around with a data collection system, and have come up
with a very hackish way to store my data- for reference, I'm
anticipating collecting at least 100 million different dataId
whatevers per year, possibly much more.

---366 data tables ( one for each day of the year ), each row being
assigned a unique DataId ( unique across all 366 tables too )
---100 data_map tables, table 0 having all DataIds ending in 00, table
99 having all DataIds ending in 99 and so on.

This is mostly because a friend of mine who works with mySQL said it
is very slow to index large tables, even if you work with mostly
integers.
However, I've read mysql can handle millions of rows no problem, so it
seems my basic design is overly complicated and will lead to tons of
slowdowns thanks to all the joins.
Another friend of mine suggested using file partioning ( though he
uses MSSQL ), so is that another option?

Any advice?
From: Tom van Stiphout on
On Thu, 3 Jul 2008 18:55:08 -0700 (PDT), nflacco
<mail.flacco(a)gmail.com> wrote:

A table per day is a REALLY BAD IDEA.
Remember that indexed lookups are very efficient. 1 record out of 4
billion can be found using 32 comparisons.
Sure, indexing that table (if there was no index before) might take a
while but doesn't have to be done more than once.

Why don't you run some performance tests on realistic hardware with a
realistic data set.

-Tom.


>I'm tinkering around with a data collection system, and have come up
>with a very hackish way to store my data- for reference, I'm
>anticipating collecting at least 100 million different dataId
>whatevers per year, possibly much more.
>
>---366 data tables ( one for each day of the year ), each row being
>assigned a unique DataId ( unique across all 366 tables too )
>---100 data_map tables, table 0 having all DataIds ending in 00, table
>99 having all DataIds ending in 99 and so on.
>
>This is mostly because a friend of mine who works with mySQL said it
>is very slow to index large tables, even if you work with mostly
>integers.
>However, I've read mysql can handle millions of rows no problem, so it
>seems my basic design is overly complicated and will lead to tons of
>slowdowns thanks to all the joins.
>Another friend of mine suggested using file partioning ( though he
>uses MSSQL ), so is that another option?
>
>Any advice?
From: Dan Guzman on
> Another friend of mine suggested using file partioning ( though he
> uses MSSQL ), so is that another option?

Partitioning is good for managing very large tables because you can rebuild
individual partition indexes without touching the entire table. This
reduces rebuild time and intermediate space requirements. Be aware that the
partitioning feature is available only in Enterprise and Developer editions.

With a good indexing strategy, response time should ideally be proportional
to the amount of data retrieved (barring cached data) regardless of whether
or not partitioning is used. Partitioning by date can facilitate certain
processes, like incremental data loads and purge/archival as well as certain
types of queries. However, with or without partitioning, indexing is the
key from a a performance perspective.

--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

"nflacco" <mail.flacco(a)gmail.com> wrote in message
news:9ee68f4f-04e9-4039-8f8f-b09af9c4a8b9(a)d19g2000prm.googlegroups.com...
> I'm tinkering around with a data collection system, and have come up
> with a very hackish way to store my data- for reference, I'm
> anticipating collecting at least 100 million different dataId
> whatevers per year, possibly much more.
>
> ---366 data tables ( one for each day of the year ), each row being
> assigned a unique DataId ( unique across all 366 tables too )
> ---100 data_map tables, table 0 having all DataIds ending in 00, table
> 99 having all DataIds ending in 99 and so on.
>
> This is mostly because a friend of mine who works with mySQL said it
> is very slow to index large tables, even if you work with mostly
> integers.
> However, I've read mysql can handle millions of rows no problem, so it
> seems my basic design is overly complicated and will lead to tons of
> slowdowns thanks to all the joins.
> Another friend of mine suggested using file partioning ( though he
> uses MSSQL ), so is that another option?
>
> Any advice?

From: nflacco on
On Jul 3, 8:14 pm, "Dan Guzman" <guzma...(a)nospam-online.sbcglobal.net>
wrote:
> > Another friend of mine suggested using file partioning ( though he
> > uses MSSQL ), so is that another option?
>
> Partitioning is good for managing very large tables because you can rebuild
> individual partition indexes without touching the entire table. This
> reduces rebuild time and intermediate space requirements. Be aware that the
> partitioning feature is available only in Enterprise and Developer editions.
>
> With a good indexing strategy, response time should ideally be proportional
> to the amount of data retrieved (barring cached data) regardless of whether
> or not partitioning is used. Partitioning by date can facilitate certain
> processes, like incremental data loads and purge/archival as well as certain
> types of queries. However, with or without partitioning, indexing is the
> key from a a performance perspective.
>
> --
> Hope this helps.
>
> Dan Guzman
> SQL Server MVPhttp://weblogs.sqlteam.com/dang/
>
> "nflacco" <mail.fla...(a)gmail.com> wrote in message
>
> news:9ee68f4f-04e9-4039-8f8f-b09af9c4a8b9(a)d19g2000prm.googlegroups.com...
>
> > I'm tinkering around with a data collection system, and have come up
> > with a very hackish way to store my data- for reference, I'm
> > anticipating collecting at least 100 million different dataId
> > whatevers per year, possibly much more.
>
> > ---366 data tables ( one for each day of the year ), each row being
> > assigned a unique DataId ( unique across all 366 tables too )
> > ---100 data_map tables, table 0 having all DataIds ending in 00, table
> > 99 having all DataIds ending in 99 and so on.
>
> > This is mostly because a friend of mine who works with mySQL said it
> > is very slow to index large tables, even if you work with mostly
> > integers.
> > However, I've read mysql can handle millions of rows no problem, so it
> > seems my basic design is overly complicated and will lead to tons of
> > slowdowns thanks to all the joins.
> > Another friend of mine suggested using file partioning ( though he
> > uses MSSQL ), so is that another option?
>
> > Any advice?

The re-indexing is what worries me. I'll be constantly adding new data
to main table ( formerly the 366 day tables ) if we follow the not use
too many tables scheme, as well as the processed-data tables.
(
From: Erland Sommarskog on
nflacco (mail.flacco(a)gmail.com) writes:
> I'm tinkering around with a data collection system, and have come up
> with a very hackish way to store my data- for reference, I'm
> anticipating collecting at least 100 million different dataId
> whatevers per year, possibly much more.
>
> ---366 data tables ( one for each day of the year ), each row being
> assigned a unique DataId ( unique across all 366 tables too )
> ---100 data_map tables, table 0 having all DataIds ending in 00, table
> 99 having all DataIds ending in 99 and so on.
>
> This is mostly because a friend of mine who works with mySQL said it
> is very slow to index large tables, even if you work with mostly
> integers.
> However, I've read mysql can handle millions of rows no problem, so it
> seems my basic design is overly complicated and will lead to tons of
> slowdowns thanks to all the joins.
> Another friend of mine suggested using file partioning ( though he
> uses MSSQL ), so is that another option?

It is not clear from your post which product you are using, and what
solution you should use may depend on which DBMS you are really using.
But since this is an SQL Server newsgroup, you will get answers for
SQL Server here.

SQL Server offers two ways for partitioning: partitioned views and
partitioned tables. The latter is preferrable, but it is only available
in Enterprise Edition.

With partitioned tables you could have one partition per day, although
I think one per month is better. With patitioned views, you cannot
have one partition per day, since you cannot have more than 256 tables
in a query.

You seem to be concerned with indexing, and it cannot be denied that
there might be a problem. Then again, if you build the index once,
and new data is added in such a way that there is little fragmentation,
it's not an issue at all. We would need to know more how the table looks
like, and how it's loaded to give more specific advice.




--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx