SQLBalls: Partitioning a Heap

http://www.flickr.com/photos/wouter28mm/3151414412/

During a busy day sometimes I don’t have a chance to check Twitter. I’ve written about how Twitter is a wonderful place to learn, Top 5 Reasons You Should Be On Twitter, and I happened to glance down at just the right moment.

Grant Fritchey (@GFritchey | Blog) asked a question on #SQLHelp. And I thought I knew the answer, YAY, but it turned out I was wrong. Grant had asked the question if you needed a primary key in order to Partition a table. Originally I had replied to Grant that I thought you had to have a Clustered Index in order to partition a table.

Some of the top minds in the SQL field quickly set in to give Grant the answer that he needed. And like any time when you get something wrong this was a wonderful opportunity to learn. The exchange went like this (You have to read this from the bottom up):

WOW! That is a lot of information over a very short space. Whenever I have used partitioning I’ve always used it on a Primary Key Clustered Index.

It’s a technology that I’ve used that I’m familiar with, but not one that I’ve played with just to see what could happen. Amit Banerjee(@banerjeeamit | Blog), Aaron Nelson (@SQLVariant|Blog), Paul White (@SQL_Kiwi|Blog), and Dave Levy (@Dave_Levy|Blog) are some pretty ridiculously smart guys. If they said it could be done I believed them. However, that's not enough. You shouldn't just take it on faith alone. You should go out there and do it yourself.

So with that Dear Reader I figured I’d work up some code we could both play with that demonstrates what the experts where talking about and lets us prove it out at the same time.

CREATE OUR DEMO_PARTITION DATABASE

Let’s Create our Database

Create our Database

That we will use for the

Demo

USE master;

IF EXISTS(SELECT name FROM sys.databases WHERE Name=N'demo_Partition')

BEGIN

DROP Database demo_Partition

END

CREATE DATABASE demo_Partition

Then we will add some Logical File Groups

Add Filegroups

ALTER DATABASE demo_Partition

ADD FILEGROUP FG2

ALTER DATABASE demo_Partition

ADD FILEGROUP FG3

ALTER DATABASE demo_Partition

ADD FILEGROUP FG4

ALTER DATABASE demo_Partition

ADD FILEGROUP FG5

Then we will associate those Logical File Groups with some Physical Data Files

Add Files and

associate to filegroups

ALTER DATABASE demo_Partition

ADD FILE

(

NAME=data_FG2,

FILENAME='<Insert directory Path>\demo_Partition_FG2.ndf'

) TO FILEGROUP FG2;

ALTER DATABASE demo_Partition

ADD FILE

(

NAME=data_FG3,

FILENAME='<Insert directory Path>\demo_Partition_FG3.ndf'

) TO FILEGROUP FG3;

ALTER DATABASE demo_Partition

ADD FILE

(

NAME=data_FG4,

FILENAME='<Insert directory Path>\demo_Partition_FG4.ndf'

) TO FILEGROUP FG4;

ALTER DATABASE demo_Partition

ADD FILE

(

NAME=data_FG5,

FILENAME='<Insert directory Path>\demo_Partition_FG5.ndf'

) TO FILEGROUP FG5;

USE demo_Partition

CREATE OUR PARTITION FUNCTION & SCHEMA

Now that we’ve got our Database we need to create our Partition Function and our Partition Schema. Our partition Function is a range of values going right or left for a particular data type. In this example we’ll be using an integer data type and we will be doing a left range for values 2000, 4000, and 6000.

This means that anything <=2000 will go in the Filegroup that we associate with this range of the function, <=4000, and >=6000.

So we should end up with 4 partitions ranging from 1 to 2000, 2001 to 4000, 4001 to 6000, and 6000 & up.

Create Partition Function

CREATE PARTITION FUNCTION compDemoPartionFunc(INT)

AS RANGE LEFT FOR VALUES(2000, 4000, 6000)

Now we will create our Partition Schema which we will use to associate our Partition Function to our Logical File Groups

Create Partition Scheme

CREATE PARTITION SCHEME compDemoPartitionPS

AS PARTITION compDemoPartionFunc

TO(fg2, fg3,fg4, fg5);

PARTITION A PRIMARY KEY CLUSTERED INDEX

http://www.flickr.com/photos/21684795@N05/4406836491/

Now that we’ve done all this, let’s start with the trusted standby of partitioning a clustered Primary Key. First let's create our table, including our primary key and clustered index. We’ll drive to the basket and slam dunk this.

IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myTable1')

BEGIN

DROP TABLE dbo.myTable1

END

CREATE TABLE myTable1(

myID INT IDENTITY(1,1),

productName char(800) DEFAULT 'some product',

productSKU varCHAR(500) DEFAULT 'Product SKU',

productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',

Comments TEXT DEFAULT 'here are some genric comments',

CONSTRAINT PK_myTable1_myID

PRIMARY KEY CLUSTERED(myID)

) ON compDemoPartitionPS(myID);

Now let's populate our data with 10,000 rows.

Let's populate our Primary Key

Clustered Index with some data

DECLARE @i INT

SET @i=0

BEGIN TRAN

WHILE (@i<10000)

BEGIN

INSERT INTO myTable1 DEFAULT VALUES;

SET @i = @i +1

END

COMMIT TRAN

Now let's take a look at see our breakdown. We don’t want to just scratch the surface here we want to dig deep and make sure that all of our rows are on the Filegroups that we selected for them. To do this we’ll be looking at sys.destination_data_spaces and joining sys.partition_schemes as this table holds the pointers for our partition schema.

select

so.name AS [Table Name]

,si.name AS [Index Name]

,ds.name AS [Filegroup Name]

,dds.data_space_id AS [partition number]

,sp.rows

,dds.partition_scheme_id

,ps.name as [partition scheme]

from

sys.data_spaces ds

LEFT join sys.destination_data_spaces dds

on (ds.data_space_id = dds.data_space_id)

left join sys.partition_schemes ps

on (ps.data_space_id = dds.partition_scheme_id)

LEFT JOIN sys.partitions sp

ON dds.destination_id=sp.partition_number

LEFT JOIN sys.objects so

ON sp.object_id=so.object_id

LEFT JOIN sys.indexes si

ON so.object_id=si.object_id AND sp.index_id=si.index_id

WHERE

so.type='U'

order by

ds.name

,ps.name ASC

As you can tell from above our partitioned table went on the Logical Filegroups we had created earlier. Because we associated them with physical Filegroups we can be confident that our data is going to the correct location.

PARTITION A CLUSTERED INDEX (NO PRIMARY KEY)

http://www.flickr.com/photos/39083668@N06/4347173049/

So now let’s drive the lane. We'll do the exact same exercise, but this time with a Clustered Index with no primary key.

Start by creating our table.

Let's create a Heap and then

we will create a Clustered Index

With no Primary Key Constraint

IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myCLX')

BEGIN

DROP TABLE dbo.myCLX

END

CREATE TABLE myCLX(

myID INT IDENTITY(1,1),

productName char(800) DEFAULT 'some product',

productSKU varCHAR(500) DEFAULT 'Product SKU',

productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',

Comments TEXT DEFAULT 'here are some genric comments'

) ;

Now let’s create our index and associate it with our partition schema.

CREATE CLUSTERED INDEX clx_Index1 ON dbo.myCLX(myID)

ON compDemoPartitionPS(myID);

Now let’s populate our table and look at the results.

let's insert some data

into our Clustered Index

DECLARE @i INT

SET @i=0

BEGIN TRAN

WHILE (@i<10000)

BEGIN

INSERT INTO dbo.myCLX DEFAULT VALUES;

SET @i = @i +1

END

COMMIT TRAN

We will use the same query and set of DMV’s that we used before.

We see that we didn’t need a primary key to achieve the exact same results as above.

PARTITION A HEAP (NO CLUSTERED INDEX)

http://www.flickr.com/photos/lscan/2225285352/

Now we’ve hit 2 for 2, let’s see if we can go for three. Same exercise this time we are using a Heap. Remember a Heap is simply a table with no Clustered Index on it. We’ll start by creating our Heap.

IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myHeapNCI')

BEGIN

DROP TABLE dbo.myHeapNCI

END

CREATE TABLE dbo.myHeapNCI(

myID INT IDENTITY(1,1),

productName char(800) DEFAULT 'some product',

productSKU varCHAR(500) DEFAULT 'Product SKU',

productDescription varCHAR(max) DEFAULT 'Here is a Generic Production Description',

Comments TEXT DEFAULT 'here are some genric comments'

) ON compDemoPartitionPS(myID);

Then we will populate it with some data.

let's insert some data

into our Heap

DECLARE @i INT

SET @i=0

BEGIN TRAN

WHILE (@i<10000)

BEGIN

INSERT INTO dbo.myHeapNCI DEFAULT VALUES;

SET @i = @i +1

END

COMMIT TRAN

Now let’s run the same query as before to see how our data is spread out.

Now this is interesting. Look at the null in the column for the Index name. It makes sense we do not have an index on a Heap. However our Heap is still partitioned. The home team wins! And the crowd goes wild!

WRAP UP

http://www.flickr.com/photos/joao/2198486114/

So once again Twitter had found a way to get information from the best and the brightest in the SQL Community. If you would have asked me before all this I would have told you, that you had to have a Clustered Index on a table in order to partition it. I’m not sure where I even got that bit of information, but when I picked it up along the way it was wrong.

I’m glad during a very busy day that I looked at Twitter just in time to learn something new.

But with new knowledge comes new questions, why would you want to create a partitioned heap, why would you create a partitioned clustered index without a primary key?

The heap question I could take a stab at, but I wouldn’t want to do it without proofing out a couple theories. So there we go Dear Reader more questions that will hopefully lead to more learning.

Thanks Again,

Brad

2 comments:

Kevin MaoJune 28, 2012 at 4:17 PM
Great post!!! I am quesitoning about partition on heap, here is answer. Thank you for sharing, just wonder how's insert/update impact on partitioned heap vs paritioned table on clustered index
Bradley BallJuly 6, 2012 at 10:21 AM
Hi Kevin, I don't think you would want to use a partitioned heap on a table that you would have insert and update activity on.

In an insert/update scenario it would be negative. However I could see partitioning a Heap on a Data Warehouse or some other database if it was large enough and it was read only.

This way you could load data only to one partition at a time instead of locking the entire table from read operations while a bulk load occurred, like on a data warehouse.

I had originally written this as a way to test and learn what Paul, Grant, Aaron, Amit, and Dave where all talking about. I still have yet to actually partition a heap in production. As I get exposed to more data warehousing and BI client operations I expect that will eventually change.

Pages

Friday, September 2, 2011

Partitioning a Heap

2 comments: