Tuesday, March 31, 2009

Searching for that code that formats a number in all my code files

If you write any type of code you must have come across this scenario:

I've written code before that formats a number nicely in Powershell, but which text file or script or function did I have that code in?

Well today I got a handy piece of Powershell code that did a whole lot of things but one thing in particular was format a number. Now I'd looked for examples on this before and didn't quite find what I was looking for. And this piece of code did exactly what I wanted. But nowhere in the Powershell code did it say "format number". I could add a comment and thats probably not a bad idea, but normally code comments are to increase readability of the code and logic and not as tags for future searches.

Well there is a very simple way to get around this problem. For the .txt or .ps1 or even .sql or .dtsx file right click and go to the properties of the file. Then click on the Summary tab. There is a Keywords text box. In here you can type keywords or tags that will help you find the code snippet or example that you think you'll need in future. In my case I typed in format number. I then searched (F3) the parent folder for all files using the second option in search, which is A word or phrase in the file. A searched for format number and bingo, 1 result from 747 files! And it was the correct file. :-) To add multiple keywords or phrases separate them with semi colons (;).

Thursday, March 26, 2009

SQL Metadata: What to store and where to store it

Being at the Consulting and Contracting end of the IT industry, I've moved around a bit in my career. The same challenge is faced when starting at a new client: What are all these databases used for? Who owns them? How are they accessed? There are a number of questions that simply can't be determined from an ERD or querying a dmv or looking at the underlying objects.

This information is database metadata and it is almost never stored in the database. Or nearly never stored, anywhere. That brings me to my first point: Database metadata should be stored in the database. This ensures that this metadata does not get orphaned or lost or decoupled from what it is describing. It gets backed up. It is easily viewable. It is secure. Ok, so we have decided on a place to store this metadata, but what exactly are we going to store?

I've found that the best answer to the "what metadata are we going to store" question are the resident DBA / sys admin staff. They know the questions that are relevant to their databases that cannot be found in the database. That highlights my next point, don't keep database metadata about what can be derived from existing metadata in system tables or dmv's. Like "Database Creation date". That is stored in sys.databases. Or the biggest tables and their record counts, also not static and viewable in many different ways.

So what are vaild metadata attributes? There are a number of them, like Database Owner, Access methods, growth requirements, backup requirements, etc. I've come up with a list of 14 attributes at my current client and I think thats a good amount. Not so much that its tedious maintaining, but at the same time still useful.

Ok, so we've talked about the reason behind keeping metadata and what we are going to keep and where. Now we'll talk about the how bit. You may be familiar with three system stored procedures that add, update and drop metadata. These are:

sp_addextendedproperty
sp_dropextendedproperty
sp_updateextendedproperty

These are documented in BOL. As each name suggests, these are used to Add, Update and Delete extended property information, or in our case, metadata. Once we've added our metadata we can view it using a function called fn_listextendedproperty.

I add the DB_NAME() metadata function to give the database metadata context:






SELECT DB_NAME() AS DbName, name as PropertyName, value as PropertyValue
FROM ::fn_listextendedproperty(NULL, NULL, NULL, NULL, NULL, NULL, NULL)




Great, so now we have a mechanism to store and view our metadata. But I think we need to take a step back. Lets suppose that we decide that "Database Owner" is a good database metadata attribute. Now we go forth and run sp_addextendedproperty 'Database Owner', 'Joe Black'. On our next database we fervently run sp_addextendedproperty 'Db Owner', 'Joe Smith'. Do you spot the problem? If I want an enterprise view of all databases on all servers and their respective owners, I would be in a pickle. Thats because I'm adding the same attribute - Database Owner, but its spelled differently on each database. So we need to enforce Domain integrity. Simply defined this is: A domain defines the possible values of an attribute. I want to make sure that in all my databases on all my servers the Business owner metadata attribute is stored as "Business Owner".

What I do is create a wrapper (DJ Sproc ) to wrap around the sp_addextendedproperty. Code to follow:





CREATE PROC usp_AddMetadata
@MetadataName VARCHAR(100), @MetadataValue VARCHAR(500)

AS
BEGIN

IF @MetadataName NOT IN ('AppDesc','AppType','BackupReq','Contacts','GrowthReq','Lifespan',

'Owner','Type','InitSize','DevDBName','ProdDevQA','SSISPack','BusOwner','ConnectionType')

BEGIN

RAISERROR('Metadataname is not valid. Must be in the following list: AppDesc, AppType, BackupReq, Contacts, rowthReq, Lifespan, Owner, Type, InitSize, DevDBName, ProdDevQA, SSISPack, BusOwner, ConnectionType', 16, 1)

RETURN

END

ELSE

BEGIN

EXEC sp_addextendedproperty @name = @MetadataName, @value = @MetadataValue
END
END




What the above code does is essentially enforce Domain Integrity for the metadata attribute names. These are defined in the first step above of "what to store". Now what I need to do is deploy this wrapper sproc to all databases on all my servers so that I am enforcing domain integrity on all database metadata. The reason I need to do this and not just create a wrapper sproc in master or model is that when adding metadata, the current database context is used. There is no way of specifying that metadata x belongs to db y. You have to physically change the database context to the relevant user database. And as we know this is tricky. I've checked and there doesn't seem to be a way of issuing the "USE DB1" statement and then execute another statement that will use DB1 as its context.
So I deploy my wrapper sproc to each database. Great. I have 50 SQL servers with an average of 40 user databases on each!! No problem, enter sp_msforeachdb, xp_cmdshell and osql. My arson of little TSQL nukes. . String them together as follows and we have the powerrrrrr:





sp_msforeachdb 'xp_cmdshell ''osql -SSQL1 -E -d? -i"f:\work\sql\2005\usp_AddMetadata.sql"'''



What this line of code does is run the undocumented but very powerful sp_msforeachdb system stored procedure. Basically for each database on a server, do x y and z. This I use in my osql command. This comand prompt command can run .sql files from a said location. So I save my wrapper sproc in f drive in the above location. This should ideally be a file share on the network somewhere. So the above line of code runs the usp_AddMetadata wrapper sproc on each user database on Server SQL1. Now if I have 40 SQL servers I take this line of code, paste it into Excel (or I could do it in TSQL) with my server list, do some string concatenation and generate 40 lines of code, one for each server. Boom!!

Clearly the above approach was prior to me learning the power of Powershell. The above approach is valid, but a bit clunky. I'll get to re-writing it in Powershell when I get a few spare CPU cycles.

As a side, because I've added the usp_AddMetadata sproc to the model system database, any new databases will have the wrapper sproc added to it automatically.

In the same way that I've created a wrapper for sp_addextendedproperty, I also create a wrapper for sp_updateextendedproperty, with the same logic and constraints.

The above is all very cool and in some ways geeky. The real power, from a management CIO point of view for example, is when I create a view or SSRS report that shows a list of all databases on all servers and who the database owner is. Or all databases that are past their sell by date. Or all databases that have grown to beyond double their initial size. Now that is seriously powerful. I have an enterprise 10000 foot (feet) view of my SQL environment.

Besides the CIO view, from an audit or capacity planning or provisioning point of view, this metadata makes DBA's and sys admins jobs a whole lot easier.




Wednesday, March 25, 2009

Querying the MSCS Cluster Log using Log Parser and Powershell

When encountering problems with a Windows High Availability Cluster, the 1st place one would look would be in Cluster Administrator and then in the Windows Event Logs. Seldom though have I seen the Cluster logs being looked at. This log (in Windows 2003 MSCS) is an anomaly in the Microsoft world. For one, its size is configured in a System Variable. Another odd behaviour is that when the log is full, the 1st half is truncated and then used. Why not just write everything to the Event Logs and have a separate log for the Cluster Service? And lastly, the format of this log even defies Log Parser, the king of making sense of large text file logs.

Enter Log Parser and Powershell.

I've written a Powershell script to:




  1. Prompt the user for the Cluster Server name with an input box
  2. Get the contents of the Windows cluster log in the default location
  3. Scrub the log
  4. Export the cleaned log to a log file
  5. Create a LogParser Object
  6. Use this LogParser Object to query the cleaned log file and only return Warnings and Errors
  7. Output these Warnings and Errors to a .csv file



Here is the Code:


$a = new-object -comobject MSScriptControl.ScriptControl
$a.language = "vbscript"
$a.addcode("function getInput() getInput = inputbox(`"Enter Cluster Server Name`",`"Cluster Log Viewer`") end function" )
$s = $a.eval("getInput")
$gcs = '\\'+$s+'\c$\windows\cluster\cluster.log'
$var = gc $gcs
$var=$var -replace 'Code0000', "Code`r`n0000"
$var=$var -replace 'ERR ', 'ERR '
$var=$var -replace 'Volume Manager', '[VMg]'
$var=$var -replace 'ERR SQL Server Agent', 'ERR [SQA]'
$var=$var -replace 'ERR IP Address', 'ERR [IPA]'
$var=$var -replace 'WARN Network Name', 'WARN [NNm]'
$var=$var -replace 'ERR Network Name', 'ERR [NNm]'
$f = 'c:\'+$s+'Clean.log'
$var ac $f
$cl = 'c:\'+$s+'Clean.csv'
$myQuery = new-object -com MSUtil.LogQuery
$objInputFormat = New-Object -com MSUtil.LogQuery.TextLineInputFormat
$objOutputFormat = New-Object -com MSUtil.LogQuery.CSVOutputFormat
$strQuery = "select EXTRACT_TOKEN(Text, 0, ' ') as StringDate, EXTRACT_TOKEN(Text,1, ' ') AS Sev, EXTRACT_TOKEN(Text,2, ' ') AS Source, EXTRACT_TOKEN(Text,3, ' ') AS D1, EXTRACT_TOKEN(Text,4, ' ') AS D2, EXTRACT_TOKEN(Text,5, ' ') AS D3, EXTRACT_TOKEN(Text,6, ' ') AS D4, EXTRACT_TOKEN(Text,7, ' ') AS D5, EXTRACT_TOKEN(Text,8, ' ') AS D6, EXTRACT_TOKEN(Text,9, ' ') AS D7, EXTRACT_TOKEN(Text,10, ' ') AS D8, EXTRACT_TOKEN(Text,11, ' ') AS D9, EXTRACT_TOKEN(Text,12, ' ') AS D10, EXTRACT_TOKEN(Text,13, ' ') AS D11, EXTRACT_TOKEN(Text,14, ' ') AS D12, EXTRACT_TOKEN(Text,15, ' ') AS D13, EXTRACT_TOKEN(Text,16, ' ') AS D14, EXTRACT_TOKEN(Text,17, ' ') AS D15, EXTRACT_TOKEN(Text,18, ' ') AS D16, EXTRACT_TOKEN(Text,19, ' ') AS D17, EXTRACT_TOKEN(Text,20, ' ') AS D18, EXTRACT_TOKEN(Text,21, ' ') AS D19, EXTRACT_TOKEN(Text,22, ' ') AS D20, EXTRACT_TOKEN(Text,23, ' ') AS D21, EXTRACT_TOKEN(Text,24, ' ') AS D22, EXTRACT_TOKEN(Text,25, ' ') AS D23, EXTRACT_TOKEN(Text,26, ' ') AS D24 INTO "+$cl+" from "+$f+" WHERE EXTRACT_TOKEN(Text,1, ' ') NOT LIKE '%INFO%'"
$myQuery.ExecuteBatch($strQuery, $objInputFormat, $objOutputFormat)





Even this process doesn't render a clean report. I then use SQL to bulk insert this .csv file into a SQL table and then further manipulate the data. I guess this could have been done in Powershell, its just with text manipulation I'm more familiar with T-SQL.



CREATE TABLE [dbo].[ClusterLog](
[StringDate] [nvarchar](50) NULL,
[Sev] [nvarchar](50) NULL,
[Source] [nvarchar](50) NULL,
[D1] [nvarchar](50) NULL,
[D2] [nvarchar](50) NULL,
[D3] [nvarchar](50) NULL,
[D4] [nvarchar](50) NULL,
[D5] [nvarchar](50) NULL,
[D6] [nvarchar](50) NULL,
[D7] [nvarchar](50) NULL,
[D8] [nvarchar](50) NULL,
[D9] [nvarchar](50) NULL,
[D10] [nvarchar](50) NULL,
[D11] [nvarchar](50) NULL,
[D12] [nvarchar](50) NULL,
[D13] [nvarchar](50) NULL,
[D14] [nvarchar](50) NULL,
[D15] [nvarchar](50) NULL,
[D16] [nvarchar](50) NULL,
[D17] [nvarchar](50) NULL,
[D18] [nvarchar](50) NULL,
[D19] [nvarchar](50) NULL,
[D20] [nvarchar](50) NULL,
[D21] [nvarchar](50) NULL,
[D22] [nvarchar](50) NULL,
[D23] [nvarchar](50) NULL,
[D24] [nvarchar](50) NULL
) ON [PRIMARY]



TRUNCATE TABLE ClusterLog
bulk insert ClusterLog FROM 'D:\Powershell\SQL3clean.csv'
WITH (FIELDTERMINATOR = ',', ROWTERMINATOR = '\n', FIRSTROW = 2)

UPDATE ClusterLog
SET Source = REPLACE(REPLACE(Source, CHAR(91), ''), ']', '')

SET CONCAT_NULL_YIELDS_NULL OFF
select 'SQLCL3' AS Cluster
,DATEADD(hh, 2, REPLACE(SUBSTRING(StringDate, PATINDEX('%::%',StringDate)+2, 19), '-', ' ')) AS TimeLogged
,"Severity" = CASE
WHEN Sev LIKE 'ERR' THEN 'Error'
WHEN Sev LIKE 'WARN' Then 'Warning'
END
,"Source" = CASE
WHEN Source LIKE 'API' THEN 'API support'
WHEN Source LIKE 'ClMsg' THEN 'Cluster messaging'
WHEN Source LIKE 'ClNet' THEN 'Cluster network engine'
WHEN Source LIKE 'CP' THEN 'Checkpoint Manager'
WHEN Source LIKE 'CS' THEN 'Cluster service'
WHEN Source LIKE 'DM' THEN 'Database Manager'
WHEN Source LIKE 'EP' THEN 'Event Processor'
WHEN Source LIKE 'FM' THEN 'Failover Manager'
WHEN Source LIKE 'GUM' THEN 'Global Update Manager'
WHEN Source LIKE 'INIT' THEN 'Initial state'
WHEN Source LIKE 'JOIN' THEN 'The node state that follows INIT when the node attempts to join a cluster'
WHEN Source LIKE 'LM' THEN 'Log Manager. Maintains the quorum log.'
WHEN Source LIKE 'MM' THEN 'Membership Manager, also known and written to the cluster log as Regroup (RGP)'
WHEN Source LIKE 'NM' THEN 'Node Manager. Keeps track of the state of other nodes in the cluster'
WHEN Source LIKE 'OM' THEN 'Object Manager. Maintains an in-memory database of entities, or objects (nodes, networks, groups)'
WHEN Source LIKE 'RGP' THEN 'Regroup, Tracks which nodes are members of the cluster'
WHEN Source LIKE 'RM' THEN 'Resource Monitor'
WHEN Source LIKE 'SQA' THEN 'SQL Server Agent'
WHEN Source LIKE 'NNm' THEN 'Network Name'
WHEN Source LIKE 'IPA' THEN 'IP Address'
WHEN Source LIKE 'VMg' THEN 'Volume Manager'
ELSE Source
END
,D1 + ' ' + D2 + ' ' + D3 + ' '+ D4+ ' '+ D5 + ' '+ D6 + ' '+ D7 + ' '+ D8 + ' '+ D9 + ' '+ D10 + ' '+ D11 + ' '+ D12 + ' '+ D13 + ' '+ D14 + ' '+ D15 + ' '+ D16 + ' '+ D17 + ' '+ D18 + ' '+ D19 + ' '+ D20 + ' '+ D21 + ' '+ D22 + ' '+ D23 + ' '+ D24 AS ErrorDescription
from ClusterLog
WHERE DATEADD(hh, 2, REPLACE(SUBSTRING(StringDate, PATINDEX('%::%',StringDate)+2, 19), '-', ' ')) > getdate()-2
ORDER BY TimeLogged ASC





Above is what the Cluster Log will look like, which is finally useful. Also, with the data being in a SQL Table, we could automate the entire process above and display Cluster Log errors and warnings for the last x days in a Reporting Services report.

Tuesday, March 24, 2009

Defragmenting a partition with a large heavily fragmented database file

I'm sure you've come across this before, you have a 100GB partition with a 60GB database mdf sprawled across the partition in 8600 fragments. Nice! IO performance will be severely impacted, especially if this partition (more often than not) is on a RAID 5 array. In addition to this there is only 15% free space on this partition. Now your task is to defragment the partition and the mdf.

Well the easiest way to accomplish this, if you have another partition with enough space to temporarily house the 60GB mdf is to move the mdf, defrag the data partition and move the mdf back, hopefully into one contiguous space. This is a classic DBA scenario which involves some T-SQL or SSMS work, some OS file system work, some OS defrag work, more file system work and then finally some T-SQL work. All good and well, but now you want to automate it, so that you don't forget about your quest halfway through the process.

Enter Powershell. Sure, you could do some of the file system stuff using xp_cmdshell. But that little gem should be locked down no? And what about the defrag? Re-enter Powershell.

In a few lines of code this process can be accomplished from start to finish. In the code below I've created a function that does the tasks outlined above. That is:



  1. Detach the database

  2. Move the mdf and ldf files

  3. Defragment the fragmented partition

  4. Move the mdf and ldf files back

  5. Attach the database

Code:



Function DetDefAt {
Param ([string]$svr, [string]$dbname, [string]$mdf, [string]$ldf, [string]$mdft, [string]$ldft, [string]$Drive2Def)

#detach db
$s = get-sqlserver $svr
#get-sqldatabase $svr $dbname
$s.KillAllProcesses($dbname)
$s.DetachDatabase($dbname,"true")

#move files
move-item $mdf $mdft;
move-item $ldf $ldft;

#defrag
$v=gwmi win32_volume
$v1=$v | where {$_.name -eq $Drive2Def}
#$v1.defraganalysis().defraganalysis
$v1.defrag($true)


#move files back
move-item $mdft $mdf
move-item $ldft $ldf

#attach db
$f = New-Object -Type System.Collections.Specialized.StringCollection
$f.Add($mdf)
$f.Add($ldf)
$s.AttachDatabase($dbname, $f)


}




Cool. The function takes 6 input parameters, the server name that the above task will be run against, the database, the mdf and ldf, the temp locations where the mdf and ldf will be moved to and the partition to defrag.

Rebuild all indexes on all tables in a database using Powershell

For one or more reasons you may want to do database maintenance like index defrags or rebuilds using Powershell. Good news is that this is quite straightforward. If you instantiate the Microsoft.SqlServer.Management.Smo.Database class, you have access to a "tables" collection. There is a method that applies to the tables collection called RebuildIndexes. This method takes Fill Factor as an input parameter. So we have the tools to do what we need to, now its a case of writing the logic to rebuild the indexes on each table. With powershell thats a breeze, since we can run a foreach loop. So assuming that $db is our database objects with $db.tables being the tables collection, we could write:

foreach ($t in $db.tables) {$t.RebuildIndexes(90)}

Another Powershell one liner! This is the most simple form of rebuilding tables in a database using powershell. One could extend this to rebuild all tables in all user databases on a server, or on a list of servers. Or one could rebuild indexes on specific tables.