Thursday, June 16, 2011

SQL Rally Q & A Row Compression with Variable Legnth Unicode


This is another follow up on the Q & A’s that came about from SQL Rally, this question was asked by Jim Murphy (@SQLMurph | Blog).   This wasn’t so much of a question as it was a discussion during the presentation. 






I have Slide, see below, that lists the data types that can use Row Compression.  



Row Compression specifically takes all the empty space out of Fixed Length Data Types and stores them as efficiently as possible.  So in essence we take a fixed length field and treat it as if it has a variable size.

Jim very astutely pointed out that I had nvarchar listed as a Data Type that compresses even though it was not a Fixed Length Data Type and Jim wanted to know how Compression worked on an nvarchar Data Type.  To be perfectly honest I botched the answer.  But it led to a wonderful learning opportunity for me.

WHAT IN THE UNIVERSE IS UNI-CODE





The short answer is it is a Universal Character Set that allows for many non-English Characters and is governed by the International Standards ISO/IEC committee.  The Data Types in SQL Server that use Uni-Code are NCHAR & NVARCHAR and they use the UCS-2 character set.  UCS-2 allows exactly two bytes to represent each character.


So what in the wide wide world of sports does this have to do with Compression.  This means that when I declare an NCHAR with a 250 length that it is actually a 500 byte value.  When I type ‘Bradley Ball’ into an NVARCHAR data type instead of taking up 11 bytes of space it takes up 22.  So this encoding can utilize a lot of space that it will need when moving the data to the user, but not necessarily in Storage.


SQL 2008 R2 introduced Unicode Compression to SQL Server.  So to get back to the original question:

“Row Compression works by taking fixed length strings and Compressing out the unused white space from the record, NVARCH is a variable length string how does Row Compression work with a Variable Length Data Type?”


The short answer it strips out the unused extra encoding if there is any to strip out.  The long answer, TO THE DEMO MOBILE!

DUHNA DUHNA DUHNA DUHNA DEMO

Coolness is not Compressed for Batman

So Let’s start off by making our database.

USE master;
GO
IF EXISTS(SELECT name FROM sys.databases WHERE Name=N'demoCOMPRESSIONInternals')
     BEGIN
          DROP Database demoCOMPRESSIONInternals
     END
    
CREATE DATABASE demoCOMPRESSIONInternals
GO

USE demoCOMPRESSIONInternals
GO

Then we’ll create our table and populate it with a couple rows.

IF EXISTS(SELECT NAME FROM sys.tables WHERE name=N'myTable1')
BEGIN
     DROP TABLE dbo.myTable1
END

CREATE TABLE myTable1(
     myID INT IDENTITY(1,1)
     ,productName NCHAR(500) DEFAULT 'some product'
     ,productDescription NVARCHAR(1000) DEFAULT 'Product Description'
     ,PRIMARY KEY CLUSTERED(myID)     
) ; 

DECLARE @i INT
SET @i=0

WHILE (@i<5)
     BEGIN
          INSERT INTO dbo.myTable1(productName, productDescription)
          VALUES(
                   ('some product' + CAST((@i +1) AS VARCHAR(5)))
                   ,('Here is a Generic Product Description' + CAST((@i+2) AS VARCHAR(5)))
                   )
             

          SET @i = @i +1

     END

We’ll set on Trace flag 3604, and we’ll do a DBCC IND to get our page number, *Page Numbers will vary when you do this on your own.


DBCC IND(demoCOMPRESSIONInternals, 'mytable1', 1)
go

DBCC TRACEON(3604)
Go

And here are our pages.


Now a DBCC Page on page 153.

DBCC PAGE('demoCOMPRESSIONInternals', 1, 153, 3)
GO

Now let’s grab an edited look at our records, we mainly want to see the Length to each record in its pre-compressed state.

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 4

myID = 1                            

Slot 0 Column 2 Offset 0x8 Length 1000 Length (physical) 1000

productName = some product1                                                                                     
                                                                                                                 
                                                                                                                
                                                                                                                 
                                                                         

Slot 0 Column 3 Offset 0x3f7 Length 76 Length (physical) 76

productDescription = Here is a Generic Product Description2               

Slot 0 Offset 0x0 Length 0 Length (physical) 0

So we can see that even though Slot 0 Column 3 only contains 38 characters it’s length is 76.  You can see the doubling of the fixed length field as well looking at Slot 0 Column 2, even though it is an NCHAR(500) it has a length of 1000 bytes.  Now let’s apply Row Compression.


ALTER TABLE dbo.myTable1
REBUILD WITH (DATA_COMPRESSION=ROW)
go


We need to do another DBCC IND to get our newly rebuilt pages.

DBCC IND(demoCOMPRESSIONInternals, 'mytable1', 1)
go

Here are the new pages.



Now let’s do a DBCC Page on page 155 and let’s take a look at our data page and get the lengths.

Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 1

myID = 1                             

Slot 0 Column 2 Offset 0xc Length 1000 Length (physical) 13

productName = some product1                                                                                     
                                                                                                                
                                                                                                                
                                                                                                                 
                                                                         

Slot 0 Column 3 Offset 0x19 Length 76 Length (physical) 39

productDescription = Here is a Generic Product Description2              

Slot 0 Offset 0x0 Length 0 Length (physical) 0

We see that the length is reduced greatly.   Slot 0 column 1 going from 4 bytes down to 1.  We see that Slot 2 went from 1000 Characters down to 13.

But the answer to the question lies in column 3 which went from 76 bytes down to 39.  When your nvarchar strings get to be even longer that get’s to be more space savings.  We did a nvarchar 1000.  If we maxed out that value it would be 2000 characters.

So let’s do that, we’ll update our values to max out our strings.

DECLARE @i INT
SET @i=0

WHILE (@i<5)
BEGIN
     UPDATE dbo.myTable1
     SET productname = (REPLICATE('a', 499) + CAST(@i AS VARCHAR(1)))
     WHERE myID=@i

     UPDATE dbo.myTable1
     SET productDescription = (REPLICATE('b', 999) + CAST(@i AS VARCHAR(1)))
     WHERE myID=@i

     SET @i=@i+1
END

And now we’ll take a look at page 155 again.
DBCC PAGE('demoCOMPRESSIONInternals', 1, 155, 3)
go
And what are the lengths of our slots?
Slot 0 Column 1 Offset 0x4 Length 4 Length (physical) 1

myID = 1                            

Slot 0 Column 2 Offset 0xc Length 1000 Length (physical) 501

productName = aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1           

Slot 0 Column 3 Offset 0x201 Length 2000 Length (physical) 1001

productDescription = bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
bbb1                                

Slot 0 Offset 0x0 Length 0 Length (physical) 0

WOW, so even when we max out our fields, Row Compression still provides us with savings.  Slot 0 Column 2 is 1000 bytes in length, but it’s physical storage is only 501 bytes, and Slot 0 Column 3 our variable length field is physically 2000 bytes but is stored in 1001 bytes.

I’m sure Dear Reader you have noticed that instead of an even 1000 it is 1001, and I would love to tell you that I know why, but I don’t.  I suspect that the 1 byte is some sort of internal byte that contains the information needed for a Unicode Character to be translated by the access methods after compression.  But your guess would be as good as mine.

Thanks,

Brad


Wednesday, June 15, 2011

Should Denali work on Windows XP


Microsoft release a list of Operating Systems (OS) and SQL Version Upgrade Paths that will be compatible with SQL Denali, SQL 11, aka SQL Next or whatever it will be called,  the next version of SQL.

1) The current support matrix for OSes is as follows:
·    Windows Vista SP2 or later
·    Windows Server 2008 SP2 or later
·    Windows Server 2008 R2 SP1 or later
·    Windows 7 SP1 or later
2) Denali will support upgrading from these SQL Server versions:
·    SQL Server 2005 SP4 or later
·    SQL Server 2008 SP2 or later
·    SQL Server 2008 R2 SP1 or later
3) The installer is going to block installation on unsupported OSes and it will block unsupported upgrade paths.


“So Balls”, You say, “What’s your opinion?”
Well Dear Reader I’ve got a couple thoughts I’ll give those and then I’ll give you the post I put in place.
IT’S ALL ABOUT THE CUSTOMERS

No matter what industry you are in you serve a customer.  Every year Microsoft has a lot of people whose sole job is to find out how many Microsoft Licenses you company has and how many it needs to have.  They work out Licensing deals and companies pay money.  When this occurs Microsoft get’s an inventory of sorts that let’s them know what people are using. 

I don’t know how granular that gets, I have no access to that information, I can only speak for myself.  Every job I’ve had since graduating college or while I was still in college has been using computers that ran Windows XP.  It has been around a LONG time.

So I would hope that the numbers Microsoft get’s would tell them what Percentage of customers still running XP.  You would then assume a natural upgrade trend, whatever the average is they see in their analysis.

I would then hope that they look at the number of hours it would take to make Denali work on XP or any other OS, based off the amount of work, the talent and cost of the team (and there is some serious Talent working for them) vs. the Revenues they project to get from licensing. 

Then I would look at the number of shops that run a lot of SQL Server and what their XP breakdown is.  If it doesn’t make Dollars and Sense they shouldn’t do it.

MAYBE THEY SHOULDN’T
“But Balls,” You say, “Earlier you told us you would like to see it support XP?   WTH?”

Ahh Dear Reader, I would like it.  It would make it easier for me to demo at work, and sell to management.  Which would work out to some of those Licensees that I talked about up top.  I’ve Windows 7 at home, and I’ll install it and learn the heck out of it there. 

But what is right for me might not be right for Microsoft.  Or it could be, it would all depend on where their customers are.

Even here at work, there is an upgrade strategy that will eventually be put into place, but that may be months or a year or more from now, those decisions are currently above my pay grade.

So finally here is what I wrote.  In the end it is up to Microsoft.  And as they are with the Clouds so am I with them, when it comes to Microsoft “I’m All IN!”.
Hi Dan,
   I'm going to disagree with the crowd, I think support for XP should be put in place.  I'm on Windows 7 at home, but work is still on XP slowly moving to 7. 

For some of us IT is a fight, it is a fight to get our companies to adopt the latest and greatest, and sometimes the big ships turn slowly.  This would certainly hurt my ability to sell the technology internally and I would assume I'm not alone in this.

As a SQL DBA I feel by default I'm a Microsoft Advocate.  In IT there are 50 ways you can do anything, I choose to do those 50 things using Microsoft and predominately SQL Server.  I Advocate moving to new technology and using the latest and greatest.  But it was a fight to use Powershell 2.0, Mirroring, 2008 R2, and Policy Based Management. 

I won't stop Advocating but it will make my fight more of an uphill battle if I cannot install on XP.

Thanks,

Brad

SQL Rally Q&A: Where is the SQL Engine does Compression Occur


This is another follow up on the Q & A’s that came about from SQL Rally, this question was asked by Jim Murphy (@SQLMurph | Blog).   Jim asked “Where in the SQL Engine does Compression Occur.”  Well Dear Reader the simple answer is in the Access Methods, but let’s dive a little deeper.

THIS IS YOUR BRAIN ON SQL



If you think of the SQL Server Engine as a brain, which has 2 hemispheres right and left, it has 2 parts the Relational Engine (right) and the Storage Engine (left). 

The Relation Engine receives a Query from the end user and it Parses it for syntax, Checks for a Query Plan, Retrieves it or Creates and Caches an execution plan, and finally it begins Executing the instructions to retrieve the Data.

This is where the Storage Engine comes into play.  The Relational Engine says here is what I want, and makes a Grocery List.  The Storage Engine takes that list and goes to the store to get all of the groceries. 

The Storage Engine receives the instructions of the Data that is required by the Relation Engine via the Access Methods.  The Access Methods will ask the Buffer Manager if this information is already in the Buffer Pool as part of the Data Cache.  If the Buffer Pool does not have the data in the Buffer Pool (ACTIVE MEMORY) then it is read off of Disk into the Buffer Pool.

As I mentioned earlier the Part of the Storage Engine that handles Compression is the Access Methods.


SO AMERICA WHY DOES THIS MATTER?



“So Balls,” You Say, “Why is this so important?”

Well Dear Reader it is because when the data is in the Buffer Pool (Active Memory) it is still in a compressed state.  So you are using the memory on your server more efficiently.  Not only is compression saving you space on disk, I/Os being read off of Disk, but the space occupied by the cached data pages is used more efficiently. 

A good example of why this is important is Transparent Data Encryption, TDE.  TDE is manage by the Buffer Pool.  That means that the data is unencrypted when it is read off of the disk and into the Buffer Pool Data Cache.  Because Page & Row Compression are managed by the Access Methods the Data is still compressed in Memory.

Another good example is when using backwards compatibility.  I've blogged about being able to use TDE on a SQL 2008 R2 Database using compatibility level 80 as well as a SQL 2008 R2 Database using Page & Row Compression using compatibility level 80.  The reason is that backwards compatibility changes the rules used by the Optimizer in the Relational Engine.

 The Storage Engine is not affected by compatibility level 80.  So knowing how SQL operates under the covers will show you how you should expect it to act, and help you know how to use the product to it's fullest extent.

Want to learn a little more about SQL Server Internals?


A great book on that is Professional SQL 2008 Internals and Troubleshooting, it is a wonderful read and there are more SQL top names on this book than you can shake a stick at.  I recommend having this book in your library.  It will help you on your way to discover more about how SQL works Internally, I know it helped me!

Thanks,

Brad