Tuesday, October 11, 2011

Lesson 2: Internal Structures, Vardecimal, & Row Compression


SQL University Compression Week: Lesson 1 About Compression

Welcome to Lesson 2 on Compression and welcome back Dear Reader.   I want to thank the Chancellor of SQL University Jorge Segarra (@SQLChicken | Blog)  for the opportunity to be your  Professor of Compression.  Yesterday we discussed What Kind of Compression SQL Server uses and Why you would want to use Compression.  Today let’s get down and dirty with the first level of Compression.  But first let’s do a history lesion.

Compression was first introduced in SQL 2005 SP2, as Vardecimal Compression.  In SQL 2005 you would have to enable the Database for Compression and then enable the table that you wanted to Compress.  The compression would only work on Numeric or Decimal data types. 

If you enabled Vardecimal Compression it would store Decimal and Numeric data as a new data type, Vardecimal.  But to understand what that data type does we need to talk about the Fixed Length portion of a Data Record.

LET’S START AT THE BEGINNING

When we talk about Compression we have to look at our data from the inside out.  So I like to make a reverse pyramid.  We understand when we create a table there is a hierarchy to it.  Tables have Columns and Rows, Columns and Rows have fields.  When you break it down to the smallest unit storage, you would think it is a field, but it’s a little bit different.


Instead of Tables, Columns, Rows, and Fields we have IAM Chains and Allocation Units, Allocation Bitmaps, Extents, Pages, and Records.  What we care about with Row Compression are the Records.   If you look up above you’ll see what I like to call my Mall map of SQL Internal Storage.  And you are here in the Records section today.  So let’s take a closer look at what that records looks like.



This is from a slide I have in my Compression talks.  If you look above you’ll find the structure of a regular record as it is stored on a page.   This is courtesy of Paul Randal (@PaulRandal | Blog) go and look at his Data Structures MCM video it will give you a deeper understanding of how SQL Storage works internally.    A Record essentially equals a row of data in a table (the deeper you dive the trickier this get’s but stick with me Row=Record). The part that we care about the most is the Fixed Length Columns.   If you declare a Decimal or Numeric data type and insert data into it, that would be stored on the Fixed Length Column of the Record.  If you declare a decimal(38,38) and put a number, 0.12345678900123456789012345678902134567, it takes up the exact same amount of space as if you stored 0.1234.   That is what a Fixed Length column means, regardless of what you use you take up all the space.

When Vardecimal Compression occurs it changes where the record is stored, it goes to the variable length storage instead of fixed.   So it is altering how the storage of a particular record data type occurs.  So now instead of 0.1234 taking up 38 bytes of data it only takes up 4 bytes of data, in other words we need only what we use.

SQL 2008 takes it a step further not only do you change how it is stored, but you change the physical storage of Data at a Record level. With Row Compression we added a lot more data types Smallint, int, bigint, bit, smallmoney, money, float, real, datetime, datetime2, datetimeoffset, char, binary, timestamp/rowversion.  SQL 2008 R2 added Unicode Compression for nchar and nvarchar. So if it is fixed length and we Row Compress it we remove all the unused space, and here is how we store it.


Paul Randal is a busy man, he has another MCM video on New Database Structures in SQL Server 2008 (go watch it).  Compression wasn’t the only new data structure we got with 2008 go have a watch and you’ll learn all about it.

So now that we know all about how Row Compression changes our storage, let’s play around with it.

DEMO

First we need to create our database that we will be using.  I’m doing this in SQL 2008 R2, so to make sure your results match up I would suggest 2008 R2 as well, however if you have 2008 for this example we will be fine.

USE master;
Go
IF EXISTS(SELECT name FROM sys.databases WHERE Name=N'demoCOMPRESSION')
     BEGIN
          DROP Database demoCOMPRESSION
     END
    
CREATE DATABASE demoCOMPRESSION
GO

USE demoCompression
GO

So we’ve got our database our next step is to create a table and populate it with some records.
CREATE TABLE dbo.myTable1(
          myID int IDENTITY(1,1)
          ,myChar CHAR(8000)
          ,CONSTRAINT PK_myTable1_myID
          PRIMARY KEY CLUSTERED(myID)
          );
GO

/*
Insert Some Records
*/
DECLARE @i INT
SET @i=0

BEGIN TRAN
     WHILE (@i<10000)
          BEGIN
              INSERT INTO dbo.myTable1(mychar)
              VALUES('myData');
              SET @i = @i +1

          END
COMMIT TRAN
Now that we’ve got a Table and Data we have something to play with we need a baeline.  Remember KNOW THY DATA!  To know our data a little more I would suggest turning on STATISTICS IO & TIME then running a select.  Now this is very figurative, if you have a more targeted work load or set of stored procedures you would want to test those by getting the statistics and query plans so you can see IF compression changes anything for the better or worse.

/*
Do a select to look
at the CPU time and I/Os
Before Compression
*/
SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT * FROM dbo.myTable1

  At this point set those baselines to the side.   I would use sp_estimate_data_compression_savings on the table specifying ROW Compression.  This will take a 5% sample of the table move it into your tempdb and apply the Compression setting that you specify, in this case row.  It will return the results and then give you an estimate based on that sample that tells you what it believes your compression rate will be.

sp_estimate_data_compression_savings 'dbo', 'myTable1', NULL, NULL, ROW ;
GO

When I apply compression I like to use sp_spaceused to see the space the table takes up before and after.

/*
How Big is our Clustered Index
*/
sp_spaceused 'dbo.myTable1';
go


/*
Rebuild Our Table with Compression
*/
ALTER TABLE dbo.myTable1
REBUILD WITH (DATA_COMPRESSION = ROW);
go

 /*
How Big is our Clustered Index
*/
sp_spaceused 'dbo.myTable1';
GO

As you can see we reduced the table from around 80 MB to 288 KB.  Now let’s run our query to get our comparison baseline and see how query performance was effected.
/*
Do a select to look
at the CPU time and I/Os
After Compression
*/
SET STATISTICS IO ON
SET STATISTICS TIME ON

SELECT * FROM dbo.myTable1

Home work, go watch Paul Randal’s MCM videos .  Take the Row Compression Demo and add some more columns and Fixed Length Data Types to it, and see what you can Compress.  Happy Compressing!

Thanks,

Brad                                                                     

Monday, October 10, 2011

SQL University Compression Week: Lesson 1 About Compression

Hello Dear Reader!  Welcome to Lesson 1 on Compression.  I want to thank the Chancellor of SQL University Jorge Segarra (@SQLChicken | Blog)  for the opportunity to be your  Professor of Compression.  Before we talk about how to turn it on, where we should apply it, or even the types of compression that there are available to us there are two burning questions that we need to answer.  What is Compression and Why should I use it?




A PICTURE IS WORTH 1000 WORDS, BUT I WOULDN’T’ TRUST IT WITH YOUR DATA
Let’s get a 40,000 foot overview of Compression.  When the good folks at Microsoft were looking at adding Compression into SQL Server the first thing they had to answer was, What kind of Compression will we use?  If you are new to the Compression game then you may not know that there are a lot of different ways to Compress Data.  Sure they all do the same thing, reduce the overall size and foot print of the Data you are Compressing, but we use different types of Compression differently.

On one end of the Scale you have Compression that reduces the overall size of the file but it does so buy stripping out what it deems as unneeded or necessary data.  The best example I can think for this is photography.   A RAW photograph file is huge.   But it contains every last bit of detail that your camera can capture.   When you look at some of the breathtaking pictures that line the magazines of scenic landscapes, athletic accomplishment, or new events, those photographs were taken in a format to capture all of the data.  When you browse USA Today on your mobile phone, or on your computer you receive this photograph in JPEG, GIF, or PNG format not in RAW.  The reason it’s smaller.  

“But Balls,” you say, “How did it get smaller?” 

The answer Dear Reader is Compression.  JPEG, GIF, and PNG are Compressed file formats for pictures.  Why do we Compress pictures?  Because the human eye can only detect so much detail.  Some people can detect more than others, but eventually we hit a threshold were we cannot detect anything more.  So the extra Data becomes unnecessary overhead at that point.  So we strip it out.  

Jim Grey went through the trouble to come up with ACID, We Need to Talk Your Database is on ACID, and Microsoft had adopted it into the core programming of SQL we can assume that they would not want to strip out data.  As Database professionals we would hope that we are preserving our data, and we are.

So we know the kind of Compression that we will use must preserve our Data,  the next question is how hard do we want to work for it?   We could use something like Winzip, and Compress our files down to the lowest possible value, preserving our data and getting it really really small.  However that only works if you aren’t planning on doing that every time you want to read it.  Have you ever zipped a 2 GB file?  It takes a little while.  Have you ever unzipped a 2 GB file, takes a little while too.  Imagine if every time you wanted to read your multi GB Database you had to sit through the same scenario.

We are still operating a Database, while reducing our Data footprint is laudable we wouldn’t want to do it at the expense of performance.  As a matter of fact if at all possible we would want performance to increase not decrease.

I know we are picky and asking a lot, Shrink the data, but don’t loose any, make it smaller but easy to access, and if possible make it work faster (in some not in all cases).   Microsoft listened and the particular flavor of Compression we got was Page & Row Compression. 

The first stab at Compression actually came up in SQL 2005 SP2 and it was called Vardecimal Compression.  It only worked on the Numeric and Decimal Data Types, but it used concepts that we will be reviewing when we discuss Row Compression.   In SQL 2008 we got Page and Row Compression and Vardecimal Compression was immediately deprecated.  You can still use Vardecimal Compression in 2008, but if you really want to use Compression you will get a lot more bang for your buck with Row Compression, more on that tomorrow.

WHY IN THE WIDE WIDE WORLD OF SPORTS SHOULD I USE COMPRESSION

Now that we understand the basic idea of what Compression is, the most important question you will ask is Why should I use it!  Disks are cheap.  I hear it all the time.  You’ve probably seen the ad’s for 1 TB hard drives for sale on Black Friday for $30.  You are correct Disks are cheap and they are getting cheaper all the time, however, I bet you don’t go and buy the Hard drives for your server on Black Friday.

As a matter of fact if you have SAN, Shared Attached Network storage?, then you need to buy different disks altogether.  You probably need to have a port in the back of them that allows for a HBA, Host Bus Adaptor, that takes a fiber cable.  The reason?  If you are using a SAN device you want the quickest possible connection for your database server.   And those HBA ports raise the price of those hard drives.  And if you are using Solid State hard drives on a SAN that price goes up even more.  Now we are talking from 10’s of dollars to 100’s of dollars.  And if you have a SAN your not going to have just one hard drive in there.  You will have multiple drives striped in some sort of RAID Array.  How cheap does it sound now?

But wait there’s more.  If you have a lot of SAN’s that can get pretty complicated.  Your probably going to need a SAN administrator.  And you may need more than one if you are a big enough shop.  So now we’re not just talking hardware but people as well.


All this for a database?  One lowly single database?  I would say most of us don’t have one Database to worry about.   You probably have a Prod (Production) environment, maybe a UAT/Staging, a Test, a Dev (Development) or even a DR/COOP.   You may have a Data warehouse as well.   You could have multiple copies of your database for multiple projects.

And you probably don’t have just one Application you are supporting.  Most places have 10’s, 100’s, and in some cases even 1,000’s of Database across just as many instances that they maintain.  When you start thinking about the scope that your data touches, you start to realize just how much you have out there.

As Database professionals we try to normalize our data to prevent data duplication, but by nature of the change control process we will always have it.  On top of that there are 3 performance bottle necks we normally encounter; CPU, Memory, and IO (Input/Output). 
The option to re-write code is always there, but how often have you been in a situation where management decides to throw heavier hardware at a problem because it is cheaper and more cost effective than a re-write?  If you haven’t been there you will eventually.  You can toss more CPU on a server to get more processing power.  More RAM to get more memory and to speed up operations because we can hold more in memory.  But with Disk, if I throw more disk at a server I just get more storage room, it doesn’t help me with performance.

Page and Row Compression can do just that.  The cost to the server is CPU cycles which most database servers have to spare.  *Just a side note if your server is averaging around 60% CPU utilization or higher I would not recommend using Compression without thorough testing to ensure it would not reduce existing performance.  We reduce the size of the structure not only on disk but in memory.  Pages of a databases are compressed on DISK and in Memory.  This is such an important concept I’m going to say it one more time, on DISK and in MEMORY!  So not only do you save room, reduce the size of a table or index.  Make it quicker to read off of disk because of a reduced size, but you take up less Page Cache because your Data Pages are compressed in memory.

 So  you can reduce your data foot print across multiple environments, possibly speed up performance by reducing the number of reads to process a query, and possibly use your memory cache more efficiently.   I don’t think the question is why should I use Compression, but why am I not using compression.

WARNING: DANGER WILL ROBINSON


I’m not going to sell you a bill of goods, that is just snake oil.  There are some serious things that you need to keep in mind when it comes to Compression.
  1. Know Thy DATA!
  2. Take Baselines
  3. Know Thy DATA!

I like to think of Data Compression like Indexes.  You wouldn’t just put an index on a table and never check to see if it is being used.  You would check to see that the query plans you have are still running efficiently, and you would monitor your performance.

You need to do the same thing with Compression.  Know how your queries perform before hand, know how they perform after Compression.  If your CPU shoot’s up you will need to trouble shoot it, we’ll cover this later in the week.  If your data types are not right, don’t compress them in the first place, later in the week as well.

HOME WORK

Your Assignment should you choose to accept it is to go read some of the Blogs by the SQL Server Storage Engine Team on the Topic of Compression, Sunil Agwilari is the Project Manager for the Engine Team and he has written a lot on this topic.

Data Compression: Why Do we need it?

Data compression techniques and trade offs

Thanks,

Brad



Sunday, October 9, 2011

PASS Summit Day 1: Travel

I woke up this morning around 2:45 AM.  Having packed the day before I just had to get ready to take go and I was out the door.  The drive in was a rainy one, but the nice thing about an early morning drive is that the roads were pretty empty so no traffic jams.  So a nice smooth ride into the city, and other than getting a little SQL Karaoke practice in completely uneventful. 

So now  I'm in Tampa and about to board, so this will expand as the day goes along.  But here is an early morning picture of beautiful Tampa from the airport!  As you can see it is dark and just starting to come to life.  The early morning work force is out, people heading home from vacations, and those of us bleary eyed travelers heading for many other destinations.   So I'm going dark for the flight, I've got some compression topics to write on for SQL U next week.



A quick picture from the clouds (34,000 feet) and I'm all back to being quite again. 



DALLAS FT. WORTH

I'll be less than half way through with the travel portion of the day by now.  The big flight will be the almost 6 hours from Dallas to Seattle.  I'm looking forward to landing already, I'm not sure how much adrenalin will continue to keep me going but I'm pretty sure I'll need a nap at some point today.



I love walking through airports, there is always something that is there to impress and there is always something cliché as well.   I found both.


A quick coffee and bagel and I was off to the Concourse.  I met up with some folks that are heading to the Summit as well.  One is from the Tampa area and the other is Ed from Arkansas.  We hit it off talking about the Pre-Con’s we would be attending and I found a friend to share a cab to the Westin. 


 Before you know it we were boarding again.  The flight is pretty packed this time and I’m in the Isle so I would have thought no pictures from the plane window.  However there was a very nice lady from Alaska sitting against the window and when I asked she was polite enough to take a picture as we were passing over the Rockies. 



SEATTLE

 
Finally on the ground.  Ed and I shared a cab ride over to the Westin, as they did not have a shuttle at the airport.  The Summit is a wonderful opportunity to meet new people, as we come together to learn about SQL Server with one another, and it was fantastic to meet Ed.   We agreed to meet up at the lobby and find some food, I was going on a bagel and some yogurt for the day, Ed was hard core and hadn’t had a thing.



While in the lobby I Tweeted the plan that we had, and quickly heard back from the SQL Dynamic Duo of Karla (@KarlaKay22 | Blog ) and Rodney Landrum (@SQLBeat | Blog).  They were at the Tap House, which had some of the best jumbilia I’ve ever had.


From here I had a great time with new, old, and current friends.  It is amazing how far you can travel in a day, to remember the things that you’ve already known.  Several games of pool later, I still can’t play worth a damn.  I have the occasional flashes of brilliance that peter out after a max of three good shots.  But playing lousy pool is a lot of fun when you’ve got friends to play it with.

From there we did a walk about the city and looked at several different destinations.  We headed over to the Convention Center and Registered for the reason we are all here.   It was amazing to see all of the people that came to register early.  A lot of them were people I had met before, and a couple whom I had been wanting to meet.  I got to met Brent Ozar (@BrentO | Blog) who is taller and younger than I imagined.  And I mean taller folks.  I’m not a short guy, but I’m right on the 6 foot mark, so when people are taller than me I notice.   And did I mention he looked younger than I expected J.

With that Rodney, Karla, and myself set off for a hole in the wall noodle shop for a quick fix of our favorite delicacies, I can never resist a good Pad-Thai.  And with that I’m heading back to finish up some work on Compression.  Time to get a good nights rest.  Tomorrow starts the Pre-con’s.

Thanks,

Brad