We'd like to thank everyone for joining us in person at our Tulsa Training Facility as well as online for this wonderful webinar -- SQL Server Performance Optimization & Tuning with our own Jim Hudson. Please tell us more about yourself and enjoy this full webinar:
In case you missed a particular detail, we've supplied a full transcript of the webinar below. Don't forget to follow Jim on Twitter for the occasional tech tip as well as OakTree Staffing & Training on Twitter, Facebook, or LinkedIn to learn more about Microsoft Certified training or staffing solutions for your company.
Cassie: Hi, guys. You kind of met me when you walked in. My name is Cassie. I'm the HR Administrator/whatever OakTree wants me to do here. A little bit of everything. I've been here for 14 years. I just want to welcome you guys. Thank you so much for coming. We're excited to have you. This is one of the biggest turnouts we've ever had. This is really exciting. But, we just want to remind you guys that we are a training and a staffing company. If you guys ever need training, obviously, we're the place to go to. Jim is obviously teaching SQL classes right now. We also do staffing, so if your company is looking for staffing opportunities or you're also looking for new positions, let us know. We also have Facebook, Twitter, LinkedIn, and make sure you're following us there so you know when one of these things comes out. We try to send an email blast. There's always new information coming out on our Facebook and on our website. So, I will hand it over to Jim. Most of you probably already know him because he's Jim extraordinaire. But he's going to be talking to you guys about SQL today. Thank you, guys.
Jim: Thank you, Cassie. Alright folks, what we're going to talk about today, I'm sure you're all aware, SQL Performance Tuning & Optimization. This has been a real area of interest for me for about 20 years. In fact, I was talking with one you a minute ago who was talking about the very first SQL Server class I sat in in 1996 and he and I were in that class. And what I found as I began to learn SQL Server was that there was just a huge amount of moving parts and it really just became personally interested in what is the architecture look like. I also began to understand the architecture.
The next question was, "How do I make this work?" Beyond that, it's "How do I make it work fast?" So, here's the next hour in a nutshell:
(It looks like people are logged on and I'm assuming the phone is working.)
"Don't touch rows you don't need."
Don't touch rows you don't need. Virtually everything I'm going to show you for the next hour is one variation on that theme. In fact, rather than hearing me say "fast", a lot of what you're going to hear me say is "cheaper." There are a number of authors out there that I encourage you to keep track of if you're interested in this subject.
Recommended SQL Authors
Pinal Dave -- if you're ever had to Google something to make it work on a SQL Server, Pinal Dave is just phenomenal.
I've had the pleasure of hearing Itzik Ben-Gan lecture a couple of times. I always walk out of there with a headache -- one of the few people on the planet that can actually give me a headache. Itzik Ben-Gan is an Israeli guy. He's got a bunch of Microsoft Press SQL books. In fact, this is his code that I've blatantly stolen that I'm going to show you here in just a moment. If you want to know architecture, particularly querying -- for what it's worth, Pinal Dave seems to come at things from a really -- sometimes it's Admin, sometimes it's querying, sometimes it's architecture. He seems to be all over the place, though he's a phenomenon. Itzik really seems to focus on querying and architecture. I'm sure he does other things, but if you ever see a book by him or a post or something, I absolutely will stop and just take the time to read that.
Probably the Grand Dame of this business is a lady named Kalen Delaney. If you've kept track of any of the older database products, Kalen worked for Sybase back in the '80s, became an MVP for SQL Server in the mid '90s and again, if you see a book or a post by her, she is just absolutely amazing.
One of the people I began to notice recently is a guy Brent Ozar. I think he has a consulting company that does a lot of this, but these are the four authors that if I'm going through my Feedly feed and I'm looking for an answer and I see their name hanging off of something, I'm absolutely going to stop and see what they have to say. Pinal Dave, Itzik Ben-Gan, Kalen Delaney, and Brent Ozar.
So, I said a minute ago that the focus what we're going to look at is "Don't touch rows you don't need." I said I'm going to spend less time talking about "fast" and a lot more time talking about "cheap." When I say "cheap", I mean cheap generally in terms of IO. I think was Itzik that I first read that said that kind of way of looking at things. You want your queries to be as cheap as possible -- as inexpensive as possible in terms of memory IO, network IO, disk IO, CPU IO. So many of the tools that we're going to look at, so many of the ideas, are really just going to revolve around, "How do I do that? How do I make that work?" We're going to look at a couple of tuning methodologies. We're going to look at top-down and query-based. It's not really one versus the other. When you're in a situation where you're the performance tuner, then you have to ask yourself, "Which way am I going to come at this?" The top-down methodology is what we're going to look at here in just a moment. The idea with this is walking into the room and kind of setting aside whatever assumptions you have. I'm really going to spend almost all of the time this hour looking at SQL architecture, talking about tools, talking about methodologies -- just different ways to look at the problem. I'm really not going to spend that much time on the server.
First off, I'm really not much of a (when it comes to physical servers) a server guy. There absolutely are white papers out there and I'd encourage you to read them to take a look at them. But we kind of skirt that just a little bit when we start with Itzik's idea. When I first started really trying to learn performance tuning, I mentioned earlier, one of the first things I did was that I learned as much architecture that I could. I read everything I could get my hands on at the time by Kalen Delaney because, at the time, she was pretty much the only person I could find -- again, this was the late '90s -- who was really talking about this. She really came at it from a query perspective. In about 2006 or 2007, Itzik Ben-Gan came out with one of his books and it had a chapter that really changed how I looked at things. That's not to say that Kalen was necessarily wrong, she was just coming at it from a different perspective.
Dynamic Management View (DMV)
So, what we are using here is something called the DMV. Now, I'm going to assume that some of you are software developers, some of you are DBAs, some of you -- like me, you know, people ask me what I do and I just giggle. I mean, what day is it? I get yanked this direction and I get yanked that direction. You may already be really familiar with DMVs and I'm going to come back and talk about these here in a bit, but a Dynamic Management View (DMV) -- and that's what this is right here, is a -- I'm going to call it a "table" probably multiple times, even though it's really not. It really is a view. One of the things that you have when you're working with a SQL Server is you have an embarrassment of riches when it comes to performance tuning information, but any kind of information. If you find a 20-year-old script -- in fact, I ran across this the other day. I trying to figure out a problem and I ran across a script. I looked at the time and date stamp. I wrote that thing 20 years ago. It still works. I don't have that much 20-year-old code in anything else I do. But these DMVs really are actually wrapping a lot of the system tables and the background that are either difficult or almost impossible to get to and we're going to come back and talk about these in a little bit of detail. But this one right now just goes out and looks at a DMV called "dm_os_wait_stats." If you get this -- one of Itzik's books and you read through these chapters, he talks about a top-down waits-based view of things. What this is going to do is, rather than my walking in and grabbing a particular query that everyone says is a problem and trying to fix it, I'm really going to the server and I'm going to say, "What are you spending most of your time waiting on?" Now, I don't have the expertise -- there are 649 rows coming back from that and I have the expertise to explain about 5 of them. But I imagine that most of us can look at the async IO completion and go, "Oh, I bet that's the disk" and can look at things like async network IO and figure out what that is. One of the things that really changed the way I look at this was using queries, going to a server, and then asking myself, "Rather than assuming what the problem is, how about I let the server tell me what the problem is?"
One particular time, I was doing -- we had, believe it or not, we had a contract with a company on an island off the coast of South America for several years and I would fly down there and spend a week or so working with them. It was about the time I really started learning this way of approaching performance tuning. They had hundreds of servers. It was just a classic off-shore company. They had hundreds of database servers and we started going in and looking at this and just found a lot of things that really surprised us. I still remember, I think it was the CEO and some of the other folks walk in, we ran this code, we looked at some issues, and they all started looking at each other going, "Why is that server busy? Why is that database busy? That shouldn't be doing anything, but it makes up a huge amount of our IO." And it really gave us a different way to track this down. So, the idea here is to use some of these DMVs and just ask myself, "What specifically am I looking for?" The next thing to do is to say, "Alright, so if I've got a busy server here (which, by the way, isn't -- this is a virtual machine, actually, I think running on that machine right there -- it's a nothing VM. I think it's Windows 2012, SQL 2012. It doesn't have any particular load on it at the moment) what we want to do, is we want to go say, "Alright, how much IO are we doing at the database level?" And these kinds of DMVs, if you'll just go Google some of these DMVs, you're going to get lots of good scripts from Pinal Dave, from Itzik -- that sort of thing back. But what this would let me do is it would allow me to go in and look at each one of these databases both at the log and the database level and go, "Where am I spending my time? Where am I spending my IO?" Again, the very fact that that this one shows so much of the percentage has to do with the fact that absolutely nothing else is going on on this server. This server has probably been turned off for 6 months months because I was teaching other classes and I just found it and went, "Hey, that one has what I need." I popped it up, threw some databases on there that I need for this lecture, and moved on. But this kind of query is going to give us the opportunity to basically say, "From an IO perspective, what are my busy databases?"
From that point, if once we identify the database, the next thing we do is to really drill down to the query level. This may come back -- yeah, it's coming back with good information even though most of it is scheduled job queries, but here's what I love about these two queries: it's essentially the same query. We're basically going out and looking at a DMV called "DM exec query stats." Essentially, it should be obvious -- order by average CPU time vs. order by total physical reads. So, what these kinds of queries allow me to do are to say, "What are the 5 most expensive queries from either a CPU time perspective, a physical reads perspective?" And we're going to talk about architecture here in just a minute and we'll talk about physical vs. logical reads. It's this sort of thing -- I remember early on, some of my first performance tuning engagements and walking in with a load of assumptions because of all of the books that I read and sitting down and going, "Now, wait a minute - this doesn't conform to my assumptions" and then I was beginning to realize, "Oh, wait -- I assumed this was true and it really isn't." This sort of thing, when I run that query, that's the most expensive query since this server was started on that particular server from a physical IO perspective.
Generally, when I get called to do PTO (Performance Tuning & Optimization), it's, "Hey, Jim, we have this query or this set of queries that take longer than we want them to." So, I walk in, I sit down and I look at the queries, I fire up some of the tools we're going to look at here in just a minute, and we can try to performance tune those queries. But with this, I'm assuming I have the luxury of going, "Man, I wish this server was faster. I wish the amount of IO would go down -- that I was able to put more on the server." These kinds of queries give me the ability to say, "Where am I spending my time and money? Where am I spending my CPU time? Where am I spending my physical reads?" Obviously, there are a lot of other columns we could pull out of this so we could sort this a number of different ways, but a lot of the idea is just to help you say, "Alright, there are tools out there that will help me pull this information."
SQL Performance Tuning Methodologies
So, in this section, we've just kind of looked at top-down vs query-based -- what are some different tuning methodologies? The first thing we really need to talk about before we talk about tools is that we need to talk about database pages vs. database extents. Starting with SQL -- gosh, was in 7 or 2000? I don't guess it really matters -- Microsoft standardized on a 8,192 byte page, an 8KB page, the lowest common denominator of database storage in a modern SQL Server environment, is an 8,192 byte page and that is just horrifically important because virtually everything we're going to look at here is going to give us, when we ask a question, it's going to return the answer in terms of pages -- this many pages, that many pages, (etc.). Those pages --of which 8,060 bytes maximum is data and 132 bytes is metadata -- are organized into 64KB extents. So, 8 pages makes up one extent. Generally -- not always, actually almost always -- what we're looking for, even though we're measuring so many things in terms of pages, what we're really trying to do is we're trying to move the data around in terms of extents. If you've ever heard of something called "scatter gather IO", the authors that I reeled off a moment ago are my favorite people so SQL. One of my favorite Windows architecture guys is a guy called Mark Russinovich. He and Bryce Cogswell called "Sys Internals" and they've got a huge number of whitepapers out there that kind of explain how windows is performance tuned. One of the common denominators of how Windows manages IO from memory to disk is that it -- if you've ever noticed that when you're setting up servers, you take the data (ike SQL Servers) you take the data allocations and you format them with 64KB -- oh, what's that called, block? Anyway, you set them to 64KB. There's a reason for that because the cache-ing mechanisms built into the operating systems and the SQL Server are optimized for 64KB buckets. Well, it's no coincidence that -- generally, when we're actually moving data around on SQL server environment, particularly with things like backups -- even though the page is the lowest common denominator, the bucket that we're using is called an extent and it's 64 kilobytes. So, most of the time today, we're going to talk about pages. There will be a couple of times where extent get really important, but again, the thing to remember is that lowest common denominator -- 8 KB page basically made up of a header, data in an offset table. That header has all kinds of pointers to other pages and just metadata about the page, but those 8 KB pages are grouped together in buckets of 8 called an "extent." Particularly, when you're doing something like backups, you're moving an extent at a time. Here's what I'm going to do:
SQL Performance Tuning Tools
For the next few minutes, I'm going to walk through and I'm just going to demo tools. I'm not so much going to use them as I'm just going to show you some of the tools and then here in a bit, we'll come back and we'll look at, "How do I use those tools to get actual work done?" But, I want to walk through and kind of talk about tools. The idea that I keep in mind (and hopefully, you'll be sick of this by the time I finish this lecture), the way you make this fast is that you don't touch rows you don't need. Well, how do I tell how many rows I'm touching?
Well, if I had to be stuck on a desert island with one performance tuning tool, it would be statistics IO. Why? Because it's simple, it's unambiguous, and it doesn't lie to me. Run it a couple times -- basically turn the stats IO connection option on and then at that point, when I start doing any kind of query, over here in the messages section, I'm going to get some additional information. Normally, we get row count. "Hey, 2,102 rows." Well, at this point, I don't care. What's I'm interested in is how expensive was this query? Now, here in a minute, we're going to turn on execution plan. Execution plan is awesome. But if I simply want to know "How much did this query cost me?", what I'm going to do is turn on stats IO -- set statistics.io on -- there's probably a way to do it through the viewing also. So, what does that give me? Well, it gives me a lot of things, but really it's about three things that I'm most interested in, ok? First off is the scan count. The scan count simply tells me how many times we had to wind through that table. In the instance of certain kinds of joins, I may have to do it more than once. I've seen it happen. But the two biggest things I'm going to look at are logical reads and physical reads. Alright, so if I looked at this and it says, "Hey, 39 logical reads and zero physical reads." That tells me that in order to return that query -- the output of that query -- it had to look at or it had to scan 39 8 kilobyte pages in memory. When we talked about pages and extents, it's a really important thing/concept to keep in mind is that the way that data is stored on the disk and the way that it is stored in memory is absolutely identical. That 8 kilobyte page with it's header, data, and off set table is stored exactly the same way on the disk as it is in memory. When I go and say, "Hey, I need these rows" select star from whatever, really, the only two options I have is either a logical or physical read. A logical read says, "Hey, great news -- in the buffer cache (in cache, ram, whatever you want to call it) we found all of the rows or all of the pages for this." We did not have to go to the disk for any of this. Now, here's one of the reasons -- well, here are several of the reasons I love this tool. I mentioned the fact that it's straight forward and it's unambiguous. I can look at this and immediately (between logical and physical reads), I can say, "Here's how many pages that query cost me." If I'm in a performance tuning engagement, one of the first things I'm going to do is take whatever query we're interested in, I'm going to turn on stats io, and I'm going to run that and that's going to be my benchmark. At that point, my biggest goal for the next however much time I have is to make that number smaller over time -- and we'll look, as we get towards the end of the lecture, we'll look at some examples of doing that. The other things I look at: if I look at this real quick and I see a non-zero physical read value, the first thing that starts going through my read is, "Ok, do I have memory pressures?" It's like any cache -- the SQL Server has this big amount of memory that it sets aside.
Those of you who manage SQL Server know that it's a memory pig. I mean, the book will tell you that it takes what memory it needs and if it realizes that the system needs it, it gives it back. Well, I certainly agree with the first part. I've absolutely seen it take the memory it needs. I'm not ever 100% sure I've ever caught it giving any back. Bottom line is that it sets aside as much memory as it can get it's hands on and then as you query rows, then it just leave them in memory. When we get to the point where it's going, "Alright, well I'm starting to run out of buffer cache" basically, it throws a popularity contest. Which of these pages -- not rows, I used the word "rows" a minute ago -- which of these pages is popular? And essentially, the least popular pages -- there's a thread in the background that says, (I don't remember what the percentage is but), "I need to keep this corner of the buffer cache empty and available" so if I need to go grab a page and pull it into the buffer cache, I have a place to put it. Essentially, every so often, it throws this popularity contest if it's starting to get full and it finds unpopular pages and it caches them out. It writes them off the disk, whatever. So, this tool, if I see a non-zero (particularly if I run it twice) physical reads, then the first question I'm ask is, "Can we fix this just by either giving it more memory -- whether that means allocating it more memory or physically giving it more memory?" I realize that a lot of you have databases that just flat won't fit into memory. I get that, but if I walk into a situation where I know that we can allocate more memory before we spend a lot of expensive consulting time, the first thing I'm going to do is say, "Alright, so what are chances? Can we just allocate more memory -- fire this thing up" and --- frankly, cache will cover a lot of sin. It really will. Alright, so stats io -- my favorite tool. The biggest thing for me is these two values right here -- trying to make them smaller.
There's another tool called "stats time" -- I'll admit, I don't love (it). I probably should. If I do a "set stats time" on, and then run a select statement. It's pretty much any query. It's going to tell me, "Hey, here's how much time we've spent parsing and compiling it. Here's how much time we spent executing it." Once you get past the low-hanging fruit section of PTO ("performance tuning & optimization"), one of the things you start spending time thinking about is, "How often am I recompiling my queries?" If we have a couple of days to spend on this subject, then we absolutely would spend some time here, but that is one of the things that we could consider. The other part seems reasonably obvious. This is how much, how long it took. This is how much CPU time it took, etc. Honestly, when I'm working, these are useful and I may snapshot these at the beginning and then run them once I'm finished just to see if I actually did any good, but for me, the goal line is right there. Run the query, copy and paste that into notepad, go try some things, compare the beginning value with the end value and see if I've done anything useful. It is really a performance tuning engagement for me. We have other tools. We've got stats io and stats time.
We've got profiler. Profiler, for me, is more a trouble-shooting tool. Profiler is an application application layer. If I fire up profiler and I say, "New Trace" and I connect to the SQL Server, what it's going to do is is it's simply going to allow me to see whatever queries get executed -- and that means, really, from anything. The thing I love about profiler is that there's not an application on the planet that can hide from it. Again, I use this more as a trouble-shooting tool. Yesterday, I'm in involved in a reporting services engagement right now and I had query bomb. I looked it and went, "There's nothing wrong with that query." But I realized that there was one thing different about it. I thought, "Alright." So, I fired up profiler -- turned it on just like I did here, ran the report, alt tabbed over and went, "Oh, crud -- I forgot about that." Because right there, it had taken all of the voodoo that I had done with the expression language in RS and it was showing me exactly the query that was getting run against the engine. If you have the query right in front of you in 14 Times New Roman, you probably don't need profiler. But the situation that I was in, when I was going, "Ok, I'm doing a little bit of voodoo before the query hits the engine", it saved me hours. I would look at it and go, "Oh, my bad" and I was able to go in and fix it. Particularly, if you have a canned application -- and this happens to me so many times -- canned application, ok this piece is slow well, let's turn on the profiler, lets run the canned application. Let's go capture some of the queries. Let's copy and paste them out. You're generally look at batch started and batch completed section. And again, folks, if we had time, we would go spend some time looking at this tool and how to manipulate it. Just know it's there. Just know that it's available for you to use. Alright, DMVs -- I'm looking at the fact that I've already used almost half of my allocated time and I'm not very happy about that. Ok, DMVs -- I'm probably going to start talking faster. I'm really sorry.
DMVs showed up in SQL 2005 and have been around ever since. There are probably hundreds of them. This guy hasn't been running that long so (let me take this clause out -- probably should have just gone and run some things). Ok, so all I see is my query. Dynamic Management Views are a wrapper around system tables that are almost impossible for a human being to read. They take the data. They run it through this filter, and they make it make it possible for ordinary humans to read. There are hundreds of them. They are very complex and if you write them scratch, you've lost your mind. Let somebody else do it. Generally, anymore, Pinal Dave writes most of my DMVs, ok? I go "oh wait," I Google it, Pinal already did it, copy+paste, I do read through them to make sure there's no insert updates, deletes, or drops in there that I'm not happy about -- at least do that. 99% of the time, it's a test system anyways. Run it and look at the output. Dynamic Management Views are there to tell you virtually everything. I mean, we saw a moment ago, "Hey, what are my five most expensive queries by CPU time? By reads?" In fact, I planned to kind of walk through this, but looking at the time, I think I'm going to move forward. There are going to be several times when we're going to look at using DMVs in the example and we'll just talk about it there. If you've not spent any time with these, you could lose yourself for weeks just looking through all of the options with these.
"dbcc showcontig" - dbcc showcontig is an old tool that I still like better than the new tool and that may be because I'm an old guy, but it doesn't change the fact that "dbcc showcontig" will let me look at -- I said that most of the time in this that we would talk about pages. This is one of the times we're going to talk about extents. Here's the deal: We'll come back and we'll look at an example of this here in a minute. One of the things that I want in a well-organized SQL Server system is that I want my data all together. If I've got a table, then all of the pages for this table are together in a series of extents. If I'm bopping from A to B to C to D to E and I'm traversing that table, I want it to stay inside of this extent. One of the things that we're going to look at here in a bit is something called "fragmentation." Fragmentation is the idea that we inserted a row and now it won't fit in here, so we have to do a split and we have to basically say, "Ok, wait -- around over here, I'm going to grab this and come back" and it just slows things down immeasurably. One of the better tools, in my opinion, for keeping track of that is dbcc showcontig." Now, here in just a moment -- also, when we come back and actually look at this or actually show an example of using this, we're actually going to look at kind of the old way and the new way -- and you are perfectly free to decide which one you want to use. For me, "dbcc showcontig" does a great job. Here are the two pieces, actually, three pieces that I look at: The first thing is scan density. The short version of the story is that that number needs to be as close to 100 as possible. Scan density needs to be as close to 100 as possible. The lower the value is, the more fragmentation you have. Now, in a production system, it's never going to be 100%. As long as you're in the 90s, then we're good to go. But that scan density basically says -- in fact, it really is just a function of the previous two, and that is "How many extents do I have?" Well, I have 34 extents. To traverse those 34 extents, I had to do 33 extent switches. If this was 5 extents and I read from here to here -- that's 1, 2, 3, 4 extent switches. The idea here is that I want to keep those values as low as possible. The other thing I look at is page density and we'll come back to that here in just a moment. There's no magic number here. That number just needs not to be surprise. Here in a minute, we're going to talk about creating indexes and we're going to talk about fill factor and we're going to talk about the idea of when I create an index, how much of that page is data and how much of it is free space. Well, that value you right there tells me that, "Hey, I'm at about 95% data and about 5% free space for this particular table." We'll look at how to manipulate that, but we're going to use this tool -- if I use stats io to tell how cheap a query is and how expensive, I used "dbcc showcontig" to tell me how well organized is my data. If the answer is, "not very" then what I need to do is update my indexes. I also need to think about maybe putting some space in there. We'll come back to that here in a second.
Alright, so finally, the last but not even close to being least -- right click, "show execution -- include actual execution plan." So, if I go up here -- and I'm just going to run a query (do I have a query...come on, Jim) we'll come back to this in a bit. In fact, give me one split second. There's something else I wanted to show you with this one. This is a tiny join. Right click > include actual execution plan -- and again, if we had an unlimited amount of time, we absolutely would spend a lot more time talking about how to read an execution plan. Here's the low hanging fruit of this: nobody's real queries are as simple as "select start from table where column equals value." I wish it were true, but it's generally not.So, in real-world situations, when the question is, "Hey, Jim -- when can we make this faster?" I go run stats io, ok, what's my benchmark? The second thing I'm probably going to do is probably this. The reason is that he things that I'm looking for -- that low-hanging fruit that I'm looking for - is, "where am I spending my time and money?" Because if you'll notice, each one of these bits and pieces of what actually happened -- this is the logical query plan -- when you submit a query to the SQL Server, it goes through this long series of steps, but one of the steps is that the query gets handed off to a engine called the "query optimizer."
The Query Optimizer
The Query Optimizer's job is to look at stats, to look at data, to look at metadata, to look at the query's search arguments, available indexes, a bunch of different stuff, and come up with an execution plan. How are we going to do this? Are we going to use this index? Are we going to use that index? Are we going to do scans? Are we going to do seeks? Which physical joining type? I imagine that most of you are familiar with inner joins, outer joins, cross joins, etc. But those are a logical join type. There are a physical join type -- nested loop joins, merge joins, and hash joins -- which is physically "how does it join this together?" Well, some of those are very expensive. Some of those, less so. This is shows me exactly what -- in fact, it says, "Hey, we did two nested loop inner joins." Ok, I can live with that, but the first thing I'm looking for is "where did I spent my time?" Because I've absolutely looked at some of these where we're spending -- the query is so complex --- and we're spending 5% here and 10% here, but I look at it and go, "I might be able to make that faster, but probably not." But if I walk in and I look at the query and it's 1%, 2%, and then I've got this one piece that's 50%, which happens a lot, then I'm going to look at that piece and go, "I wonder if we could speed that piece up?" If this is a join of 5 or 10 tables and, boy, we're spending half of our time or our IO or cost, for what it's worth, on that one piece, then maybe it's worth saying, "Boy, is there an index on there that we're using? Are we doing a scan? Are we doing a seek? What could we do to make this faster?" So, these are kind of the tools that we're going to look at, so let's go talk about indexes for just a moment. We're going to talk about heaps,clustered indexes, non-clustered indexes, and column store indexes.
A Clustered Index
Every table in a SQL Server environment exists in one of two forms: either a heap or a clustered index. A heap is simply a table structure that has the rows generally in whatever order the rows were added to the table. It's not quite that simple -- again, if we had more time, we'd go through how that could not be true -- but the vast majority of the time, a heap is just a table, we haven't put a clustered index on it, we haven't tried to sort the data any particular way, and the data is just in there in whatever order is added. The other way the table can exist is called a "clustered index." If you want a mental picture of a clustered index, think of a phone book. A phone book is composite clustered index on last name, first name. I'm going to ignore the middle name piece. A clustered index -- a conceptual example of a clustered index is a paper phone directory. It has a composite clustered index on last name, first name. Aaron Aardvark and Zebulon Zebra is at the very end. That's an example of a clustered index.
So, here with the HEAP, you'll notice, it's incredibly simple. I have three columns, first name, age, and last name. It's in no particular order. It's not in order by any three of these. If I create a clustered index on that same data, then several things happen. So, in this example, even though this is a horrible idea, but it's good for my demo, in this example, I've created a clustered index on first name, ok? So, Abraham is at the beginning and Sylvia is at the end. So, when we create a clustered index -- when we go in and create a clustered index -- here's what happens: first off, it physically reorganizes and sorts the data based on the clustering key. In this example, the clustering key is first name. It then creates -- and it always does that -- it then creates an additional construct called a "root page" and that root page is a little like a table of contents. It may or may not one or more intermediate pages. So, the idea is if the table is small enough and we create a clustered index, then we may only get a root page and then the leaf or data pages. But in most production situations, I may actually have multiple levels of intermediate pages. Because at the end of the day, from either the root or the intermediate pages, I have to have pointers that point to each one of these 8 kilobyte pages and if I've not a 10,000,000 row table with individual values, then I probably have multiple levels of intermediate pages. But, essentially, the way a clustered index gets used is if I'm looking for heather, then the query optimizer is going to look at this and go, "Oh, where first name equals heather." I have a clustered index on Heather and it's going to be able to do a clustered index seek -- which simply means that's it's going to go to the root pages, it's going to go, "Abraham -- Heather is greater than Abraham, but less than Johnson. Alright, greater than Abraham and Charlene and Dirk, therefore it must -- going to go to this page", it scans this page, and I'm good to go. And at that point, if I've done a set stats IO on this query, it's going to come back with three pages - one page, one intermediate page, and one leaf level page. This sort of thing is why I said that an understanding of architecture is so important and having good tools and knowing how to use them is so important to performance tuning -- otherwise, I'm just changing things, pressing F5, counting on my fingers to see if it's faster or not. With these tools, I can go, "Hey, this used to cost me 100 logical reads, now it's 25 logical reads. We win." So, anyway -- clustered index.
A Non-Clustered Index
Now, here's the thing: obviously, since the clustered index physically sorts the data, I can only have one clustered index per table. I also have another option called a "non-clustered index." If a clustered index is a little bit like the phone book, a non-clustered index is a little bit like the index at the back of a good technical book. If I go buy a good technical book, it's going to have an index at the very end that allows me to go look up certain concepts and it's going to say, "Oh, SQL connection object? That's on pages 27, 105, and 200." It doesn't have any information on SQL connection objects there, it just has the term "SQL connection" and that tells me what pages to find information. That's what a non-cluster index does. I can have hundreds -- I honestly don't remember what the number is now, it used to be 250, I know it's gone up -- but I can have lots. More than I need of non-clustered indexes because it doesn't physically change the data in either the heap or the clustered index. It simply creates, if you will, kind of that index like at the back of a book that says, "Hey, if you're interested in this data, go here." So, in this example, if I were looking for anyone who is 33 years old (actually, to make this simple, I'm going to go with 44...no, 46), then I would say, "Alright, so, where age equals 46." Well, 46 is greater than 22, greater than 44, greater than 44 but less than 67, must be on this page (I accidentally grabbed the wrong thing), so it's going to know to scan this page. It's going to find Cheryl -- but here's the thing: I want everything. I want the entire row. So, then what it's going to do is it's going to take Cheryl, which is the clustering key, and it's going to go back and it's going to use the clustered index then to find this particular row. 99% of the time, the data in your SQL Server exists in the form of this (this thing turns off every 4 hours, now you guys know what time I got to work) exists in the form of either a heap (which means that is doesn't have a clustered index) or it has a clustered index, then it may or may not have, regardless of whether or not it's a heap or a clustered index, it may have some non-clustered indexes. Now, starting with SQL 2012 (and I don't have an example of this -- I'm just going to talk about it really briefly -- holy cow, time goes by really fast when I'm talking).
A Column Store Index
There's something called a column store index -- usually, that's more of a data warehouse thing, but a column store index pretty much blows away a lot of the architecture rule that we've talked about. It basically compresses it, it stores it on a column based environment versus a row based environment. It eliminates duplicates. It's incredibly fast for reads. In 2012, column stored indexes were read-only. Starting in 2014, then they're starting to be some availability to give updates, but at the end of the day, if you come across the idea of a column stored index, just know that, generally, it's the whole table crammed into this compressed thing. They can be really fast, but they're really designed for really read-heavy tables versus tables that are updated a lot. Here's a DMV that's really useful -- just as soon as a I find my cursor. There it is. One of the mistakes that I've consistently made when I'm doing PTO is assuming that I know what's going on. One of the things that you can do is you can use the sys.dm_db_index_usage_stats DMV to go get a list of your most popular indexes. If you change the sort order on this, then you could get a list of your least popular indexes and if I went out and ran this and I changed the sort order and I saw a bunch of indexes there that hadn't been touched in a year, my next question would be -- indexes aren't free. They take up space. Locking in IO when we go to do modifications. This is just an incredible tool for going out and saying, "Am I using this index or not?" Alright, so we've looked at indexes and now, I think it's time to go talk about when we can use some of this.
So, if I go look at this particular environment (it's incredibly simple, but I kind of need that for a fast demo), so I've got a table out. It's got two columns and random values. Right now, it does not have any indexes. If I go run this query, it tells me that it has 116,000 rows. If you really want to understand architecture, I encourage you to go spend some time with the sysindexes table. "sysindexes" has a huge amount of information about tables and indexes and that sort of thing. One of those tables, the first time you look at it, you'll go, "Wow, there's almost no human-readable data" but once you start joining it with things like sysobjects and running it through some filters and that sort of thing, you can pull back an enormous amount of really good information. If you've ever done a select count star to find out how many rows are in a table, which is an unbelievably expensive operation, this does exactly the same thing. Basically, it's probably 2 rows out of 2 tables and it's not touching the actual data at all. The reason why we're saying indid=0 is that I know that this table does not a clustered index yet. A heap will have an index ID of "0". A clustered index will have an index ID of "1." A non-clustered index will have an index ID of "2." Even if there are a hundred -- I'm sorry -- 2 or greater -- will have an index ID of "2" or greater. So, if you see "0" or "1", you're actually talking about the data. Alright, so why -- if I've got this table -- and 116,000 rows is not huge, but it's big enough to be interesting.
So, let's do this: let's turn stats IO on and then let's go do a right click, "include actual execution plan", and then let's do a select where column = 500. So, out of that, 116,000 rows, I've got 13 rows with a pkcol value of 500. So, the first thing I'm going to do is I'm going to remember the fact that, "Hey, with 0 indexes, returning any kind of value from this costs me 257 pages in memory." I'm encourages by the fact that there are zero physical reads, but still, I had to go look at 257 8 kilobyte pages in order to return 13 rows. That's expensive. It's not awesome. So, what I want to do or one way to deal with this then is to go create a clustered index. So, if I jump up here and I create a clustered index "create clustered index on table -- column name" -- you can see here we did multiple scans. It's just kind of interesting what it did here. So, a minute ago, it took me 257 logical reads to do this and now it's taking 3. Here's what that tells me: that tells me that the depth on this 1 root page, 1 intermediate level (there's always 1 root level and 1 page level), it also tells me that there's also 1 intermediate level, because it did what I told you a minute ago. I went to the root page, said, "Oh, it's in this range." I went to whichever page is appropriate in the intermediate level and said, "Ok, great -- bing, bing, bing. I bet it's on this." I jumped to the page level -- the leaf level (sorry, "leaf level" is the term we usually use), I got my rows and was out. I don't know that's what happened, but I'm pretty sure that's what happened in this kind of situation. If I go and look at the execution plan, I see magic, ok? Simply using an index is not necessarily good news. What I'm looking for are index seeks. That whole root intermediate "there it is on the root level" -- that's an indexed seek. If I went back and ran this again with no "where" clause -- which, basically says "No, sarg - no search argument" and I go look at my execution plan -- well, it would be really easy to go, "Hey, I used my clustered index. Life is good. Man, this is so much faster." Except that it's not because now it's actually more pages. Why? Because I've got the overhead of the index.
One of the reasons we use stats IO and the execution plan together is because I don't want scans. Folks, if it's a clustered index, it's always going to use it and it can't use a search argument and it has to read all of the rows (which is what it had to do a minute ago), then it's always going to be a clustered index scan because the data exists in the form of a clustered index. But if I go back and I've run this where it can actually use the index (again, I'm back to my three reads and I've got a clustered index seek, which means we're using the power of the index), the organization of the index to make my query just remarkably cheaper. Now -- that's the good news. Here's the bad news: the same query, slightly different search argument, I go over here and look at messages and I'm back to hundreds of reads. Why? Because my clustered index on this column is not helping me if I'm searching for this one. And if I go look, it's going to be a clustered index scan. Again, we did a table scan. A clustered index scan and a table scan is, for all intents and purposes, are the same thing. It is the SQL server equivalent of reading the entire phone book every time you need somebody's phone number. That's literally what it's doing -- it's just doing it fast. So, obviously, one way to deal with this would be to go up and create a non-clustered index. So, now if I go back and run these query then I look at logical reads. We did a non-clustered index seek, life is good. You've seen the tools. You've had some exposure to indexes. At this point, performance tuning becomes a balancing act between -- gosh, my queries have 10 tables and 500 columns and it's not this simple. But you can use these same exact tools to go in and say, "Is it expensive or not?" If you give me query and I look at it and I've got index seeks, and good physical joins, I've absolutely outside of a classroom looked at that gone, If you want to sit here while I bang on this, I'm happy to, but the chances of me making it any faster are really small because I just don't see anything obvious to fix. You have these tools you can start with -- this is how much it costs. Let's run it -- oh wait -- that's not optimal. I wonder if we can fix that. Then, once you change it, you can use those exact same tools (sorry, not sure why my voice is slipping out here), you can use those exact same tools to say, "Did we accomplish anything useful?" Simply because it feels faster doesn't necessarily mean it is faster. If the number of rows gets smaller, you've done useful work.
A Covering Index
Alright, so, we've look at cluster indexes and non-clustered indexes. I also want to talk about something called a "covering index." Folks, here's what I'm going to do: I realize that it's 12:25pm. I'm going to continue probably for another few minutes and then give you some time for questions. I realize that some of you may not be able to stay and, folks, if you have to go, it's not going to hurt my feelings. I hope you can stay and get the rest of this, but do what you got to do. So, let's look at this idea real quickly because this could be really useful: There is the concept of a covering index and a covered query. One of the things that we pointed out was that when we make a non-clustered index, the only thing that it stored is the non-clustered index key and the clustering key. It is possible, however, that we could have an index -- a non-clustered index -- that has some additional non-sorted columns. So, let's say that we have a table out there with 100 columns, but we realize that, frequently, we do a query against that table and we only pull back 3 or 4 columns and none of them are the clustering key. Generally, what people will tell you -- and this is generally true -- is that the fastest way to return information in a query is to do a clustered index seek. And you know, quite a lot of times, that really is true. But thing about it -- clustered index is the data and even if I'm doing a clustered index and I'm only looking at a few pages, I'm actually absolutely returning -- or at least having to scan -- all of the row. If that row has 100 columns and I only need 4 out of it, I'm returning 96 columns -- or at least scanning 96 columns -- I don't need. It's possible to create a non-clustered index called a "covering index". Let's say you have 2 key columns, but then you just arbitrarily add 2 more columns and you say, "Hey, go ahead and store an extra copy of this data in this particular non-clustered index" and what will happen then is, if the query optimizer looks at that query and says, "Oh, wait, I've got a non-clustered index that covers this query" and actually, we'll use that and it will never actually touch the table. Here's the next problem: I tested this this morning and I don't the condition I left it in. Give me a just a second here.
Ok, so I have the clustered index -- you know what, I'm not going to take the time. I'm just going to show you kind of what it looks like. A minute ago, it used the clustered index. Ok. Notice that now we've created this covering index using "include" which basically says, "Yes, go create the non-clustered index on this table. Here's the non-clustered index key, but I want you to include this column." Then, if I turn around and I do a query that includes the clustering key, the non-clustered key, and the additional included column, then when I run that query, it never actually even goes to the data itself. It goes to the non-clustered index and, honestly, that's probably the fastest way you can ever satisfy a query. You're not only filtering the rows, you're actually filtering the columns. Obviously, the downside here is a) I'm storing my data twice, b) I have to know that this query gets executed a lot and it's narrow enough that this makes sense. Alright -- I think the last thing I want to do is do a demo with dbcc showcontig and then I'll go really briefly through the best practices. So, everything we've done so far really has been more about making queries faster. So, what we want to look at is what happens when we do inserts. So, I'm going to go create a clustered index on this table -- oh, I guess I already have one, oh yeah, because I did it a minute ago. So, I got my clustered index. It's using my clustered index. Life is good.
I want to go see how organized it is, so I run a dbcc showcontig -- oh, and now, because it has a clustered index, it doesn't an entry of 0, it has an entry of 1. We do a dbcc showcontig and I've got great news: I've got 44 extents, 43 extent switches, the extent switches should always be as close to extents as possible. Averages pages per extent is 8. Wow, I actually got 100% scan density. That almost never happens. The numbers just happen to work out. Usually, it's 99 or 98-point-whatever. My page density is about a 99% full. This is great news. So, when I went in and said, "Hey, give me everything that meets this criteria where whatever equals 500", it was able to go "bang, bang, this row, read read read read read -- even went over multiple pages -- read read read read read -- I'm out. I know I'm finished and we're done. Incredibly efficient. Here's how to screw it up: so we go back up here and we do some random inserts. 2, 3, 4, 5, stop. It seems like we were at 116,000 rows at one point. (Oh, again -- change it to 1.)
Let me go add some more. Alright, so I've got some rows in there. Row count is now still not up much, but let's go see if it's changed anything at all. Alright - dbcc showcontig -- alright. So, even only adding a few rows, we've gone from 70-something extent switches to 427. Here's what this is conceptually similar to: you go to the grocery store with a list of 50 things you need to buy and you buy them in alphabetical order, so you're constantly running from one end of the store to the other to get what you need versus thinking, "Ok, I need this and it's over here, duh duh duh..." and just making one big sweep through the store. Because what's happened is that when we did those inserts, that data page is full and I come in and it's one of the reasons that the inserts are randomized. I say, "Hey, I need to insert this row." Well, that row has to go right here to satisfy that clustered index. You just don't have an option. The system looks at and goes, "Man, there's no space here." So, it literally does what's called a "page split" -- it takes these rows, goes and finds another page and another extent, copies this over here, moves those rows, puts a pointer between the two, so as you're scanning through here in a minute, you're going to go, "Row, row, row, (brakes sound) over here, grab these, back over" and it's just unbelievably expensive. It's a huge amount of IO. The way to fix that (and I don't have an update, I should have an update command here and I don't, so let me just drop this real quick...) Ok, dropped and recreated -- go to a dbcc showcontig, and we're back in business. Here's the thing, guys -- your tables get inserted, updated, and deleted on a regular basis. How do you avoid this. Well, one way to avoid this is to go back up here and the next time I create this index, say, "with fill factor equals..." and give it a value. So, now, when it creates an index, it's going to create it 95% data -- 5% free space. The idea is that if I get 1 or 2 or 3 or however many rows fitting on that, then it's just going to insert it into the free space and life is good. Then, in a real world environment, it becomes this tug-of-war between "ok, I want to leave some free space" but over time, that's going to get filled up, so I have to go rebuild my indexes. And one of the real considerations here is there ain't no such thing as a free lunch. Well, if I've got an 8 kilobyte page in memory, but I've got this much free space, that means I just gave away, what, 20% of my buffer cache in this particular example? No, 25% of my buffer cache in this particular example. Because I'm storing that free space. That's one of the reasons I made such a big deal about how it's stored on the disk is how it's stored in memory. That page is being stored now (and this one, it was 95, but if I set it to 80, I'm giving away 20% of my space), then for that particular table, I'm giving away 20% of my buffer cache.
SQL Server Performance Tuning & Optimization Best Practices
Alright, best practices: don't touch rows you don't need. Prove to yourself what's happening. Go in and use stats IO, use the execution plan. Here's how many rows, here's how many pages, here's what it's doing. It's doing a scan. It's doing a hash join versus a merge join or a nested loop join. Don't forget indexes on your foreign keys. I see this so often. We create a primary key -- which, of course, is one or more columns that uniquely identifies a row and a table. It automatically creates an index, but when we create foreign keys that point to those primary keys, it does not create and index. And when you go to join those tables, it's going to do something called a "hash match join" which is unbelievably inefficient. If you've got primary key/foreign key relationships, make sure you've got good indexes on your foreign keys. Make sure your indexes are being used -- that's one of the things that show execution plan is for. Make sure it's actually using your indexes. Ok, so if it's not using your indexes, we don't have time to demonstrate this, but it would be really tempting to go in and use something that called an "index optimizer" again, where, basically, you add a little piece of code that goes in and says, "I'm smarter than you and I want you to use this index." Every time I do that, it makes my queries more expensive. I heard a guy at Microsoft once say that the query optimizer is smarter than you. So far, he's absolutely been right. So, if it's not using my index, that generally means there's something I don't understand happening. I need to go think that through.
I didn't show this earlier -- I saw it and meant to point it out -- don't sort if you don't have to. I realize that's easy to say in a classroom and hard to do outside, but folks, if you go back and look at a show execution plan, you'll frequently see that the last thing that happens is a sort and it is frequently a significant portion of your cost. By default, it's going to come back in whatever order the clustering key. Obviously, in most production situations where you need that sort, just be aware that it's frequently is very expensive. Questions or comments? John?
John: You have a couple online questions.
Jim: Oh, ok.
John: One of them mentioned the authors you spoke about earlier.
Jim: Ok, unfortunately, this thing doesn't like when I try to scroll. Oh, mention the SQL authors. In fact, I just put it up here. There are the authors: (Pinal Dave, Itzik Ben-Gan, Kalen Delaney, Brent Ozar).
"Is there something we need to do in order to enable the execution plan? When I click on it, run a query, it does not work." Yeah, two things: 1) right click > include actual execution plan. Once you turn that on and run it (I think now, I just turned it off, actually because it was on) -- include actual execution plan the next time you run anything. Wow, have I just killed it? Ok, normally, include actual execution plan will just do the job. Yeah, for what it's worth, if you don't have enough permissions, some of this stuff may not work for you. I'm sorry to say that. I realize that not all of you are DBAs. One of the permissions frequently you'll need to have is called "show server state." Questions or comments? Folks, that's what I got. I appreciate you coming. I wish my classes were like this every week.
Cassie: We made cookies, so make sure you grab a couple cookies on your way out.