One day the DBA team was given a toy frog as some part of swag from a vendor. We did what any group of grown men would do. We put a dunce hat on it.
We decided that whoever screwed up next would have it sitting on their
cubical wall, and we would pass it around as the next offender appeared. A fun little way to pass the time and rib one another.
No sooner had I participated in developing this badge of
shame, than I earned it. The title says it
all. How to take down prod in 30
seconds, but I should clarify. Not some,
not half, but allllllll you’re clustered servers in just 30 seconds.
I have to give a special Thank you to my buddy Dan Taylor (
@DBABulldog |
Blog), you see I remembered I had
the frog. I had forgotten what I had
done to earn it. It was sitting on the
edge of my mind, but no matter how hard I tried I could not remember it. It was sitting in a fog just out of
reach. An itch that I couldn’t scratch. A few words out of his mouth and it all came
flooding back. As a good friend we've swapped many stories over the years, without his memory (which is better than mine) I would have had to go with a less interesting tale of woe.
“So Balls”, you
say, “How did you screw up?”
Well Dear Reader I had an unfortunate convergence of
unexpected anomalies that peaked in a spectacular crescendo of a mistyped
password. Yes a mistyped password. My second chance would be typing in
correctly. The next best thing is
explaining it so you hopefully never have to feel the same pain.
I SOLEMNLY SWEAR I AM UP
TO NO GOOD
I had a new production SQL 2008 R2 Instance to install. Things were going pretty smooth. I got up to the screen where you punch in the
password for the service account, and that’s when it all went wrong.
I mistyped the password. GASP, SHOCK, AWE, OTHER SUCH EXPRESSIONS!!!!!
Normally I would agree no big deal, but the next time I
punched in the password I didn't get a password error, I got an error informing
me that the account was locked. Enter
the series of unfortunate events.
Imagine you live in a world where all of the Prod servers
are using the same service account.
Imagine that you've suggested this be changed but it ended up on the “That’s
a good idea we’ll tackle that another day” pile. Imagine that you are not using Microsoft
Clustering for your Clustered servers, and that the inventive Server Engineers
rolled their own “health check”. Imagine
that your current password policy locks out when you mistype the password
somewhere between 3-8 times.
“But Balls”, you
say, “You only typed your password
once? Not 3-8 times!”
Exactly. There’s a
bug in the installer for SQL Server 2008 R2.
When you click the next button after filling out the service account information, you authenticate
at least twice for every account you type in.
Not so in SQL 2005 or SQL 2008 (not R2).
But in SQL 2008 R2 one mistyped password counts a whole lot more. Depending on the services being installed,
enough to lock out an account.
Then you are left to watch the manual health checks fail because the
account is locked out, attempt a cluster failover, only to be locked out on the
other side because the SQL Service account was locked out.
You catch your error quickly. Run to the Team Lead, report what has
happened, hoping this can get fixed before the inevitable outages begin. Then you race back to your desk. You have an uncomfortable phone call to place to the help desk.
Imagine that while this unfolds you are waiting on hold for
the help desk to open a ticket (you have to follow protocol), that will get
assigned to an engineer, who will pass it on to AD Services. Queue the uncomfortable elevator music.
Co-workers scrambling in the back ground, like the bull pin of a busy newspaper. Someone is keeping an active wipe board of what servers are now down, every minute someone
in your cube starting to say “Have you….” Only to be cut off by your response “Still
on Hold”. Queue the music.
Other co-workers are fielding calls from App Teams reporting that their applications are offline. Other co-workers trying to reach managers that can bypass a well-orchestrated
bureaucratic separation of duties that results in elevator music while you are
still on hold. Did I mention being on hold? While on hold forty-five minutes can feel like weeks.
The saving grace (for my job), the bug I found was easily to
duplicate. It was easy to see that this
behavior was not in previous versions.
As an added bonus those service accounts started becoming unique real quick.
DEMO: THE BUG I
LEARNED ALL ABOUT
We’ll skip ahead a bit. Say you are installing SQL Server 2008
R2. We’ve gotten up to the Server
Configuration where we are punching in our passwords. First let’s open up our Event Viewer, click
on our Security Tab and clear it out.
*If this were anything other than my personal VM I would
backup the log so we could restore it, do not clear out a security log on a
prod server without proper guidance.
Now the only event in our log is the event denoting that our
log has been cleared. Back to SQL
Server.
We will click on the Use the same account for all SQL Server
services button and type in our .\s-sqlsrv service account. Definitely not following best practices
here. SQL Engine, SQL Agent, and SSIS
all getting the same service account.
Let’s Type the password in wrong and see what happens? Click OK.
Click Next.
SQL reacted just like we thought. Theoretically we should have 1 bad login
check right? The same user name was in
use, we don’t need to validate it 3 more times. One should do. Perhaps at most we’ve got three validation checks right?
Let’s head over to our trusty error log and see.
We’ve gone from 1 to 13 errors in the click of a
button. How many failed logins do we
have? Not 1, 2, 3, 4, 5, 6, 7, but 8 failed
logins from one attempt. You’ll get this
if you use the button or if you do not use the button.
You may be asking did this get fixed in SQL 2012?
One look at the installer and you can see the button is
gone. Let’s punch in the same service
account name and an incorrect password.
And now on to our error log.
Wow! Six entries, now
we are looking at 3 entries per account.
Nope didn’t get any better.
WRAP IT UP
Long story short, make sure those passwords are
correct. Personally I like to use a
utility like
KeePass to generate, store, and copy my passwords from. Anything that keeps me from typing. Or as the case may be mistyping
J.
As always Dear Reader, Thanks for stopping by!
Thanks,
Brad