From: jbdhl on
Consider a table and a query referring to only a subset of the columns
in that table. How early in the query evaluation is the projection
carried out?

Are the columns to be selected filtered out as early as in the very
access method that reads the table rows from the buffer, or are the
projection handled later, after the whole row has been fetched by the
access method?

Does it depend on the complexity of the query, how far down the three
that the projection is handled out?

Thanks!
From: vic on
On Mar 2, 6:09 am, jbdhl <jbirksd...(a)gmail.com> wrote:
> Consider a table and a query referring to only a subset of the columns
> in that table. How early in the query evaluation is the projection
> carried out?
>
> Are the columns to be selected filtered out as early as in the very
> access method that reads the table rows from the buffer, or are the
> projection handled later, after the whole row has been fetched by the
> access method?
>
> Does it depend on the complexity of the query, how far down the three
> that the projection is handled out?
>
> Thanks!

That's an interesting topic: examining the query plan might help
answer this to some degree. Each query can be handled differently by
the optimizer because, as Plamen pointed out, the answer depends on
many factors.

Also consider that a one-table select statement that retrieves only
one row, via an index, will still result in one entire datapage of I/O
plus whatever index I/O was needed.

That is, every column of every row on the datapage will be I/O'ed. So
a partial answer to your question is that, at the point that the query
engine access the datapages, extraneous columns have not yet been
filtered out.

Exceptions to this occur, as you probably already know, whenever a
covering index is used.

FYI, there are "column-based" or "vertical" database products on the
market that do not operate this way, so that when data is accessed
only the queried columns are being I/O'ed. It's as if every data
access is actually done via a covering index. I believe that these
products are used in the data warehouse/BI context rather than the
OLTP context, and I assume that this is because there are performance
trade-offs involved.

- victor
From: jbdhl on
> But in practice the query optimizer may move steps up ro down the execution plan to find efficient way to retrieve data.
> One example is using covering index for the query where no unnecessary data access will be incurred.

Will the projection ever happen directly in the access method that
fetch rows from the buffer? Or will it always be done at some point
after the initial fetch?
From: Erland Sommarskog on
jbdhl (jbirksdahl(a)gmail.com) writes:
> Will the projection ever happen directly in the access method that
> fetch rows from the buffer? Or will it always be done at some point
> after the initial fetch?

I'm not sure that your question even does make sense. SQL Server reads
rows from pages, either from disk or from the buffer cache. When a plan
operator reads a row, it will read the columns that the parent operator
asks for. These columns can be in the SELECT list, or be needed for a
join elsewhere in the plan.

I would suggest that you study some query plans in SQL Server Management
Studio. This may not answer all your questions, but rather it may raise
new questions, but hopefully questions that are more on the mark.

--
Erland Sommarskog, SQL Server MVP, esquel(a)sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

From: Gert-Jan Strik on
jbdhl wrote:
>
> Consider a table and a query referring to only a subset of the columns
> in that table. How early in the query evaluation is the projection
> carried out?
>
> Are the columns to be selected filtered out as early as in the very
> access method that reads the table rows from the buffer, or are the
> projection handled later, after the whole row has been fetched by the
> access method?
>
> Does it depend on the complexity of the query, how far down the three
> that the projection is handled out?
>
> Thanks!

In general, the optimizer is written to parse, compile and execute a
query as fast as possible. As a rule it will cut out every unnecessary
part that is possible, as long as there is a gain in the end, and it
will try to do so as early as possible.

Typically, the operator that locates the rows will also read the table
data. The exceptions are operators like index joining or index
intersection.

Of course, the storage engine will only fetch the columns that are
needed to process the query. However, since the relative cost is low
(mainly memory space for intermediate results), other factors are
typically much more important, such as access path and method, because
they usually have a much bigger impact on performance. As you may know,
physical I/O is orders of magnitude slower than any reading from the
buffer cache, so avoiding I/O has the highest priority in the
optimization strategy. Important cost factors (not necessarily in this
order) are locating and locking a page, the I/O to fetch a (range of)
pages, the cost associated with sorting, hashing and hash joins, etc.
Compared to this any performance difference between selecting all colums
or just a subset of columns from a buffered page is insignificant (if it
is even possible to measure any effect).

--
Gert-Jan