http://www.flickr.com/photos/wouter28mm/3151414412/ |
During a busy day sometimes I don’t have a chance to check
Twitter. I’ve written about how Twitter is a wonderful place to learn, Top
5 Reasons You Should Be On Twitter, and I happened to glance down at just
the right moment.
Grant Fritchey (@GFritchey | Blog) asked a question on #SQLHelp. And I thought I knew the answer, YAY, but it turned out I was wrong. Grant had asked the question if you needed a primary key in order to Partition a table. Originally I had replied to Grant that I thought you had to have a Clustered Index in order to partition a table.
Some of the top minds in the SQL field quickly set in to give Grant the answer that he needed. And like any time when you get something wrong this was a wonderful opportunity to learn. The exchange went like this (You have to read this from the bottom up):
Grant Fritchey (@GFritchey | Blog) asked a question on #SQLHelp. And I thought I knew the answer, YAY, but it turned out I was wrong. Grant had asked the question if you needed a primary key in order to Partition a table. Originally I had replied to Grant that I thought you had to have a Clustered Index in order to partition a table.
Some of the top minds in the SQL field quickly set in to give Grant the answer that he needed. And like any time when you get something wrong this was a wonderful opportunity to learn. The exchange went like this (You have to read this from the bottom up):
WOW! That is a lot of information over a very short space. Whenever I have used partitioning I’ve always used it on a
Primary Key Clustered Index.
It’s a technology that I’ve used that I’m familiar with, but not one that I’ve played with just to see what could happen. Amit Banerjee(@banerjeeamit | Blog), Aaron Nelson (@SQLVariant|Blog), Paul White (@SQL_Kiwi|Blog), and Dave Levy (@Dave_Levy|Blog) are some pretty ridiculously smart guys. If they said it could be done I believed them. However, that's not enough. You shouldn't just take it on faith alone. You should go out there and do it yourself.
It’s a technology that I’ve used that I’m familiar with, but not one that I’ve played with just to see what could happen. Amit Banerjee(@banerjeeamit | Blog), Aaron Nelson (@SQLVariant|Blog), Paul White (@SQL_Kiwi|Blog), and Dave Levy (@Dave_Levy|Blog) are some pretty ridiculously smart guys. If they said it could be done I believed them. However, that's not enough. You shouldn't just take it on faith alone. You should go out there and do it yourself.
So with that Dear Reader I figured I’d work up some code we
could both play with that demonstrates what the experts where talking about and
lets us prove it out at the same time.
CREATE OUR
DEMO_PARTITION DATABASE
Let’s Create our Database
/*
Create our
Database
That we will use
for the
Demo
*/
USE master;
Go
IF EXISTS(SELECT name FROM sys.databases WHERE Name=N'demo_Partition')
BEGIN
DROP Database demo_Partition
END
CREATE DATABASE demo_Partition
GO
Then we will add some Logical File Groups
/*
Add Filegroups
*/
ALTER DATABASE demo_Partition
ADD FILEGROUP
FG2
GO
ALTER DATABASE demo_Partition
ADD FILEGROUP
FG3
GO
ALTER DATABASE demo_Partition
ADD FILEGROUP
FG4
GO
ALTER DATABASE demo_Partition
ADD FILEGROUP
FG5
GO
Then we will associate those Logical File Groups with some
Physical Data Files
/*
Add Files and
associate to
filegroups
*/
ALTER DATABASE demo_Partition
ADD FILE
(
NAME=data_FG2,
FILENAME='<Insert directory Path>\demo_Partition_FG2.ndf'
) TO FILEGROUP FG2;
GO
ALTER DATABASE demo_Partition
ADD FILE
(
NAME=data_FG3,
FILENAME='<Insert directory Path>\demo_Partition_FG3.ndf'
) TO FILEGROUP FG3;
GO
ALTER DATABASE demo_Partition
ADD FILE
(
NAME=data_FG4,
FILENAME='<Insert directory Path>\demo_Partition_FG4.ndf'
) TO FILEGROUP FG4;
GO
ALTER DATABASE demo_Partition
ADD FILE
(
NAME=data_FG5,
FILENAME='<Insert directory Path>\demo_Partition_FG5.ndf'
) TO FILEGROUP FG5;
GO
USE
demo_Partition
GO
CREATE OUR
PARTITION FUNCTION & SCHEMA
Now that we’ve got our Database we need to create our
Partition Function and our Partition Schema.
Our partition Function is a range of values going right or left for a
particular data type. In this example we’ll
be using an integer data type and we will be doing a left range for values 2000, 4000, and 6000.
This means that anything <=2000 will go in the Filegroup that we associate with this range of the function, <=4000, and >=6000.
So we should end up with 4 partitions ranging from 1 to 2000, 2001 to 4000, 4001 to 6000, and 6000 & up.
This means that anything <=2000 will go in the Filegroup that we associate with this range of the function, <=4000, and >=6000.
So we should end up with 4 partitions ranging from 1 to 2000, 2001 to 4000, 4001 to 6000, and 6000 & up.
/*
Create Partition
Function
*/
CREATE PARTITION FUNCTION
compDemoPartionFunc(INT)
AS RANGE LEFT FOR VALUES(2000, 4000, 6000)
GO
Now we will create our Partition Schema which we will use to
associate our Partition Function to our Logical File Groups
/*
Create Partition
Scheme
*/
CREATE PARTITION SCHEME compDemoPartitionPS
AS PARTITION compDemoPartionFunc
TO(fg2, fg3,fg4, fg5);
PARTITION A PRIMARY KEY CLUSTERED INDEX
http://www.flickr.com/photos/21684795@N05/4406836491/ |
Now that we’ve done all this, let’s start with the trusted standby
of partitioning a clustered Primary Key. First let's create our table,
including our primary key and clustered index.
We’ll drive to the basket and slam dunk this.
IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myTable1')
BEGIN
DROP TABLE
dbo.myTable1
END
CREATE TABLE myTable1(
myID INT IDENTITY(1,1),
productName char(800) DEFAULT 'some product',
productSKU varCHAR(500) DEFAULT 'Product SKU',
productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',
Comments TEXT DEFAULT 'here are some genric
comments',
CONSTRAINT PK_myTable1_myID
PRIMARY KEY
CLUSTERED(myID)
) ON compDemoPartitionPS(myID);
Now let's populate our data with 10,000 rows.
/*
Let's populate
our Primary Key
Clustered Index
with some data
*/
DECLARE @i INT
SET @i=0
BEGIN TRAN
WHILE (@i<10000)
BEGIN
INSERT INTO myTable1 DEFAULT
VALUES;
SET @i = @i +1
END
COMMIT TRAN
Now let's take a look at see our breakdown. We don’t want to just scratch the surface here
we want to dig deep and make sure that all of our rows are on the Filegroups
that we selected for them. To do this we’ll
be looking at sys.destination_data_spaces
and joining
sys.partition_schemes as this table holds the pointers for our partition
schema.
select
so.name AS [Table Name]
,si.name
AS [Index Name]
,ds.name
AS [Filegroup Name]
,dds.data_space_id AS [partition
number]
,sp.rows
,dds.partition_scheme_id
,ps.name
as [partition scheme]
from
sys.data_spaces ds
LEFT join
sys.destination_data_spaces dds
on (ds.data_space_id = dds.data_space_id)
left join
sys.partition_schemes ps
on (ps.data_space_id = dds.partition_scheme_id)
LEFT JOIN
sys.partitions sp
ON dds.destination_id=sp.partition_number
LEFT JOIN
sys.objects so
ON sp.object_id=so.object_id
LEFT JOIN
sys.indexes si
ON so.object_id=si.object_id AND sp.index_id=si.index_id
WHERE
so.type='U'
order by
ds.name
,ps.name ASC
As you can tell from above our partitioned table went on the
Logical Filegroups we had created earlier.
Because we associated them with physical Filegroups we can be confident
that our data is going to the correct location.
PARTITION A CLUSTERED INDEX (NO PRIMARY KEY)
http://www.flickr.com/photos/39083668@N06/4347173049/ |
So now let’s drive the lane. We'll do the exact same exercise, but this time with a
Clustered Index with no primary key.
Start by creating our table.
/*
Let's create a
Heap and then
we will create a
Clustered Index
With no Primary
Key Constraint
*/
IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myCLX')
BEGIN
DROP TABLE
dbo.myCLX
END
CREATE TABLE myCLX(
myID INT IDENTITY(1,1),
productName char(800) DEFAULT 'some product',
productSKU varCHAR(500) DEFAULT 'Product SKU',
productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',
Comments TEXT DEFAULT 'here are some genric
comments'
) ;
Now let’s create our index and associate it with our partition schema.
CREATE CLUSTERED INDEX
clx_Index1 ON dbo.myCLX(myID)
ON
compDemoPartitionPS(myID);
go
Now let’s populate our table and look at the results.
/*
let's insert
some data
into our
Clustered Index
*/
DECLARE @i INT
SET @i=0
BEGIN TRAN
WHILE (@i<10000)
BEGIN
INSERT INTO dbo.myCLX DEFAULT VALUES;
SET @i = @i +1
END
COMMIT TRAN
We will use the same query and set of DMV’s that we used
before.
We see that we didn’t need a
primary key to achieve the exact same results as above.
PARTITION A HEAP (NO CLUSTERED INDEX)
http://www.flickr.com/photos/lscan/2225285352/ |
Now we’ve hit 2 for 2, let’s see if we can go for three. Same exercise this time we are using a Heap. Remember a Heap is simply a table with no
Clustered Index on it. We’ll start by creating our Heap.
IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myHeapNCI')
BEGIN
DROP TABLE
dbo.myHeapNCI
END
CREATE TABLE dbo.myHeapNCI(
myID INT IDENTITY(1,1),
productName char(800) DEFAULT 'some product',
productSKU varCHAR(500) DEFAULT 'Product SKU',
productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',
Comments TEXT DEFAULT 'here are some genric
comments'
) ON compDemoPartitionPS(myID);
Then we will populate it with some data.
/*
let's insert
some data
into our Heap
*/
DECLARE @i INT
SET @i=0
BEGIN TRAN
WHILE (@i<10000)
BEGIN
INSERT INTO dbo.myHeapNCI DEFAULT VALUES;
SET @i = @i +1
END
COMMIT TRAN
Now let’s run the same query as before to see how our data is
spread out.
Now this is interesting.
Look at the null in the column for the Index name. It makes sense we do not have an index on a
Heap. However our Heap is still partitioned. The home team wins! And the crowd goes wild!
WRAP UP
http://www.flickr.com/photos/joao/2198486114/ |
So once again Twitter had found
a way to get information from the best and the brightest in the SQL
Community. If you would have asked me before all this I would have told
you, that you had to have a Clustered Index on a table in order to partition
it. I’m not sure where I even got that bit of information, but when I
picked it up along the way it was wrong.
I’m glad during a very busy day that I looked at Twitter just in
time to learn something new.
But with new knowledge comes new questions, why would you want to
create a partitioned heap, why would you create a partitioned clustered index
without a primary key?
The heap question I could take a stab at, but I wouldn’t want to
do it without proofing out a couple theories.
So there we go Dear Reader more questions that will hopefully lead to
more learning.
Thanks Again,
Brad
Great post!!! I am quesitoning about partition on heap, here is answer. Thank you for sharing, just wonder how's insert/update impact on partitioned heap vs paritioned table on clustered index
ReplyDeleteHi Kevin, I don't think you would want to use a partitioned heap on a table that you would have insert and update activity on.
ReplyDeleteIn an insert/update scenario it would be negative. However I could see partitioning a Heap on a Data Warehouse or some other database if it was large enough and it was read only.
This way you could load data only to one partition at a time instead of locking the entire table from read operations while a bulk load occurred, like on a data warehouse.
I had originally written this as a way to test and learn what Paul, Grant, Aaron, Amit, and Dave where all talking about. I still have yet to actually partition a heap in production. As I get exposed to more data warehousing and BI client operations I expect that will eventually change.