Wednesday, June 8, 2011

HEAPS & COMPRESSION

Compression did not work out so well for Bessy

Heaps are really interesting animals when it comes to Compression.  They start out having a lot in common with Clustered Indexes, but they stop just after creation.

  You can apply compression to a Heap or Clustered Index upon creation, or you can apply compression to a Heap or Clustered Index using a Rebuild Operation and specifying the type of compression you would like to apply.

After that the behaviors diverge.  When a Heap is rebuilt and Page Compression is Applied, even if you only have enough data to occupy one page, two will be provisioned. 

You will always get One new page that does not have any data on it.  In the case of our table with only Two pages, Page One will contain your data in its Page Compressed format, as long as an internal algorithm can determine that you would save enough space to fully apply Page compression, and subsequent inserts will be in a row Compressed format.  Page Two will only be row compressed until it is rebuilt.  But don’t take my word for it let’s prove this out.

First I’m going to create my database and my table that we’ll be using for this demo.

USE master
GO
IF EXISTS(SELECT * FROM sys.databases WHERE name='demoCompressionINTERNALS')
BEGIN
    DROP DATABASE demoCompressionINTERNALS
END
CREATE DATABASE demoCompressionINTERNALS
GO
USE demoCompressionINTERNALS
GO
/*
Create a Heap Table
*/
IF EXISTS(SELECT * FROM sys.tables WHERE name='heap1')
BEGIN
    DROP TABLE dbo.heap1
END
CREATE TABLE heap1(
    myID int IDENTITY(1,1)
    ,myChar1 CHAR(500) default 'some text'
    ,myChar2 Char(1000) default 'more text for a larger field'
    );
GO

Our table has 1504 bytes per record, so we’ll insert 5 records to fill up our page.  Let’s run our script to populate the page with exactly 5 records.

DECLARE @i INT
SET @i=0

WHILE (@i<5)
    BEGIN
         INSERT INTO dbo.heap1(myChar1, myChar2)
         VALUES(
                 ('some product' + CAST((@i +1) AS VARCHAR(1)))
                 ,('Here is a Generic Product Description' + CAST((@i+2) AS VARCHAR(1)))
                 )
         SET @i = @i +1
    END

We’ll set trace flag 3604 so we can get the output of DBCC Page to SSMS, we’ll run DBCC IND to get  our pages, and finally let’s run DBCC Page on our data page and see the internal contents. 

DBCC TRACEON(3604)
GO
DBCC IND(demoCompressionINTERNALS, 'heap1',1)

We’ll Go ahead and run that.




You can see that I get back 2 pages 146 & 145.  Looking at the picture I’ve highlighted the PageType field.  Page Type 1 is a Data Page, and that is the type we are interested in.  Page 10 is an IAM page.  We want to look at our Data Page, so we’ll take page 145 and look at it.  For the sake of space I’m going to hop down to the first record on the page after the hex dump.  We’ll look at Slot 0.

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

myID = 1                            

Slot 0 Column 2 Offset 0x8 Length 500 Length (physical) 500

myChar1 = some product1                                           

Slot 0 Column 3 Offset 0x1fc Length 1000 Length (physical) 1000

myChar2 = Here is a Generic Product Description2 

We are looking at our un-compressed Record.  Page Compression has three parts to it that are applied in the following order, Row Compression, Column Prefix Compression, and Page Dictionary Compression.

So first let’s apply Row Compression and see our space savings.

ALTER TABLE dbo.heap1
REBUILD WITH (DATA_COMPRESSION=ROW)
GO

We need to re-run DBCC IND & DBCC Page as our Rebuild operation will cause our page numbers to change.

DBCC IND(demoCOMPRESSIONInternals, 'heap1', 1)
GO



As you can see our page numbers changed, and we now have our second page has been allocated.  If we look at page 160 our data is on that page, and 147 is blank.  So let’s look at Slot 0

DBCC PAGE('demoCOMPRESSIONInternals', 1, 160, 3)
GO
Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 1

myID = 1                            

Slot 0 Column 2 Offset 0xc Length 500 Length (physical) 13

myChar1 = some product1                                                                                         
           
Slot 0 Column 3 Offset 0x19 Length 1000 Length (physical) 38

myChar2 = Here is a Generic Product Description2



Row Compression did exactly what it was supposed to, it took the fixed length strings and stored only the space required.  So if we max out our columns we see that our sizes mirror what it was before row compression.

DECLARE @i INT
SET @i=0

WHILE (@i<5)
BEGIN
    UPDATE dbo.heap1
    SET myChar1 = (REPLICATE('a', 499) + CAST(@i AS VARCHAR(1)))
    WHERE myID=@i

    UPDATE dbo.heap1
    SET myChar2 = (REPLICATE('b', 999) + CAST(@i AS VARCHAR(1)))
    WHERE myID=@i

    SET @i=@i+1
END
DBCC PAGE('demoCOMPRESSIONInternals', 1, 160, 3)
GO

We see that our data has grown, and that the physical size now matches the logical size again, everything except our ID column.

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 1

myID = 1                            

Slot 0 Column 2 Offset 0xc Length 500 Length (physical) 500

myChar1 = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1               

Slot 0 Column 3 Offset 0x200 Length 1000 Length (physical) 1000

myChar2 = bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb1

Let’s add some more data so we have more than 1 page worth of data.  We will need this data in order to get Page Compression to apply.

DECLARE @i INT, @myID INT
SET @i=0

WHILE (@i<10)
BEGIN
         SET @myID=(SELECT (MAX(myid)+1) FROM dbo.heap1)


         INSERT INTO dbo.heap1(myChar1, myChar2)
         VALUES(
                  (REPLICATE('a', (250- LEN(@myID))) + CAST(@myID AS VARCHAR(3)))
                  ,(REPLICATE('b', (500- LEN(@myID))) + CAST(@myID AS VARCHAR(3)))
             )

    SET @i=@i+1
END

 If we take a look at our two pages, 160 & 140 we see that they are both populated with rows.

DBCC IND(demoCOMPRESSIONInternals, 'heap1', 1)
GO
DBCC PAGE('demoCOMPRESSIONInternals', 1, 160, 3)
GO

DBCC PAGE('demoCOMPRESSIONInternals', 1, 147, 3)
GO

So now we will apply Page Compression.  We will need to run DBCC IND & DBCC Page again, as this will rebuild our pages.

ALTER TABLE dbo.heap1
REBUILD WITH (DATA_COMPRESSION=Page)
GO
DBCC IND(demoCOMPRESSIONInternals, 'heap1', 1)
GO 
DBCC PAGE('demoCOMPRESSIONInternals', 1, 200, 3)
GO


As you can see we have our pages again.  And after running DBCC Page we’ll see that we have a CI, or Compression Information Header Section.  This is where Page Compression will store the CI that it uses for Column Prefix and Page Dictionary Compression.

 Just because the ID of 6 appears in our data doesn’t mean we are going to Slot 5 (remember zero based array’s).  This is a heap and our data is not sorted off of the ID column like it is with a Clustered Index.  Slot 0 contains our record with an ID of 6 and that corresponds with our CI information.  This will be our most compressed record.  I’ll go down further and grab Slot 6 for us to use as a comparison.

* Note the difference in the record lengths in the Compression Data Array, CD array entry. I’ll highlight those in yellow.

CD array entry = Column 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)
CD array entry = Column 2 (cluster 0, CD array offset 0): 0x01 (EMPTY)   
CD array entry = Column 3 (cluster 0, CD array offset 1): 0x01 (EMPTY)   
Record Memory Dump

000000001443A35E:   01031211 86000000 00†††††††††††††††††....†....       

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 1

myID = 6                             

Slot 0 Column 2 Offset 0x0 Length 500 Length (physical) 0

myChar1 = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa6                                                                              
                                                                                                                 
                                                                         

Slot 0 Column 3 Offset 0x0 Length 1000 Length (physical) 0

myChar2 = bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb6 

Slot 6 Offset 0x3a2 Length 13

Record Type = (COMPRESSED) PRIMARY_RECORD                                 Record size = 13

CD Array

CD array entry = Column 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)
CD array entry = Column 2 (cluster 0, CD array offset 0): 0x05 (FOUR_BYTE_SHORT)
CD array entry = Column 3 (cluster 0, CD array offset 1): 0x05 (FOUR_BYTE_SHORT)

Record Memory Dump

000000001443A3A2:   01035215 8c80f831 3281f231 32††††††††..R.Œ.ø12.ò12   

Slot 6 Column 1 Offset 0x4 Length 4 Length (physical) 1

myID = 12                            

Slot 6 Column 2 Offset 0x5 Length 500 Length (physical) 4

myChar1 = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa12                                                                              
                                                                                                                 
                                                                         

Slot 6 Column 3 Offset 0x9 Length 1000 Length (physical) 4

myChar2 = bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb12

So when we look at our other page we will see that there are no records on it.  Go do a DBCC Page and take a look just to confirm.

DBCC PAGE('demoCOMPRESSIONInternals', 1, 155, 3)
GO

So let’s insert a couple more records, we should have plenty of room to fin on this page right?

DECLARE @i INT, @myID INT
SET @i=0

WHILE (@i<10)
BEGIN
         SET @myID=(SELECT (MAX(myid)+1) FROM dbo.heap1)


         INSERT INTO dbo.heap1(myChar1, myChar2)
         VALUES(
                  (REPLICATE('a', (250- LEN(@myID))) + CAST(@myID AS VARCHAR(3)))
                  ,(REPLICATE('b', (500- LEN(@myID))) + CAST(@myID AS VARCHAR(3)))
             )

    SET @i=@i+1
END

Now take another look at Page 155, you’ll see that these rows take up around 762 bytes each, and that and are not in a Page Compressed format, this looks more like a Row Compressed format and it is.

Slot 0 Offset 0x60 Length 762

Record Type = (COMPRESSED) PRIMARY_RECORD                                 Record attributes =  LONG DATA REGION
Record size = 762                   
CD Array

CD array entry = Column 1 (cluster 0, CD array offset 0): 0x02 (ONE_BYTE_SHORT)
CD array entry = Column 2 (cluster 0, CD array offset 0): 0x0a (LONG)    
CD array entry = Column 3 (cluster 0, CD array offset 1): 0x0a (LONG)    

“But Balls,” you say, “I thought we enabled Page Compression?”  Ahh Dear Reader, we did.  However, Heaps do not automatically apply Page Compression when data is added to a page.   There are 3 exceptions to this
  1.  When data is inserted into a heap using Bulk Insert
  2. When data is inserted into and a full table lock is placed on the table.
  3. When a heap is Rebuilt specifying the with (Data_Compression) option, as we did above.


So when you use Page Compression together with Heaps keep their behavior in mind so you don’t encounter any surprises.  Good luck and Happy Compressing!

Thanks,

Brad

No comments:

Post a Comment