Thursday, March 26, 2009

SQL Metadata: What to store and where to store it

Being at the Consulting and Contracting end of the IT industry, I've moved around a bit in my career. The same challenge is faced when starting at a new client: What are all these databases used for? Who owns them? How are they accessed? There are a number of questions that simply can't be determined from an ERD or querying a dmv or looking at the underlying objects.

This information is database metadata and it is almost never stored in the database. Or nearly never stored, anywhere. That brings me to my first point: Database metadata should be stored in the database. This ensures that this metadata does not get orphaned or lost or decoupled from what it is describing. It gets backed up. It is easily viewable. It is secure. Ok, so we have decided on a place to store this metadata, but what exactly are we going to store?

I've found that the best answer to the "what metadata are we going to store" question are the resident DBA / sys admin staff. They know the questions that are relevant to their databases that cannot be found in the database. That highlights my next point, don't keep database metadata about what can be derived from existing metadata in system tables or dmv's. Like "Database Creation date". That is stored in sys.databases. Or the biggest tables and their record counts, also not static and viewable in many different ways.

So what are vaild metadata attributes? There are a number of them, like Database Owner, Access methods, growth requirements, backup requirements, etc. I've come up with a list of 14 attributes at my current client and I think thats a good amount. Not so much that its tedious maintaining, but at the same time still useful.

Ok, so we've talked about the reason behind keeping metadata and what we are going to keep and where. Now we'll talk about the how bit. You may be familiar with three system stored procedures that add, update and drop metadata. These are:

sp_addextendedproperty
sp_dropextendedproperty
sp_updateextendedproperty

These are documented in BOL. As each name suggests, these are used to Add, Update and Delete extended property information, or in our case, metadata. Once we've added our metadata we can view it using a function called fn_listextendedproperty.

I add the DB_NAME() metadata function to give the database metadata context:






SELECT DB_NAME() AS DbName, name as PropertyName, value as PropertyValue
FROM ::fn_listextendedproperty(NULL, NULL, NULL, NULL, NULL, NULL, NULL)




Great, so now we have a mechanism to store and view our metadata. But I think we need to take a step back. Lets suppose that we decide that "Database Owner" is a good database metadata attribute. Now we go forth and run sp_addextendedproperty 'Database Owner', 'Joe Black'. On our next database we fervently run sp_addextendedproperty 'Db Owner', 'Joe Smith'. Do you spot the problem? If I want an enterprise view of all databases on all servers and their respective owners, I would be in a pickle. Thats because I'm adding the same attribute - Database Owner, but its spelled differently on each database. So we need to enforce Domain integrity. Simply defined this is: A domain defines the possible values of an attribute. I want to make sure that in all my databases on all my servers the Business owner metadata attribute is stored as "Business Owner".

What I do is create a wrapper (DJ Sproc ) to wrap around the sp_addextendedproperty. Code to follow:





CREATE PROC usp_AddMetadata
@MetadataName VARCHAR(100), @MetadataValue VARCHAR(500)

AS
BEGIN

IF @MetadataName NOT IN ('AppDesc','AppType','BackupReq','Contacts','GrowthReq','Lifespan',

'Owner','Type','InitSize','DevDBName','ProdDevQA','SSISPack','BusOwner','ConnectionType')

BEGIN

RAISERROR('Metadataname is not valid. Must be in the following list: AppDesc, AppType, BackupReq, Contacts, rowthReq, Lifespan, Owner, Type, InitSize, DevDBName, ProdDevQA, SSISPack, BusOwner, ConnectionType', 16, 1)

RETURN

END

ELSE

BEGIN

EXEC sp_addextendedproperty @name = @MetadataName, @value = @MetadataValue
END
END




What the above code does is essentially enforce Domain Integrity for the metadata attribute names. These are defined in the first step above of "what to store". Now what I need to do is deploy this wrapper sproc to all databases on all my servers so that I am enforcing domain integrity on all database metadata. The reason I need to do this and not just create a wrapper sproc in master or model is that when adding metadata, the current database context is used. There is no way of specifying that metadata x belongs to db y. You have to physically change the database context to the relevant user database. And as we know this is tricky. I've checked and there doesn't seem to be a way of issuing the "USE DB1" statement and then execute another statement that will use DB1 as its context.
So I deploy my wrapper sproc to each database. Great. I have 50 SQL servers with an average of 40 user databases on each!! No problem, enter sp_msforeachdb, xp_cmdshell and osql. My arson of little TSQL nukes. . String them together as follows and we have the powerrrrrr:





sp_msforeachdb 'xp_cmdshell ''osql -SSQL1 -E -d? -i"f:\work\sql\2005\usp_AddMetadata.sql"'''



What this line of code does is run the undocumented but very powerful sp_msforeachdb system stored procedure. Basically for each database on a server, do x y and z. This I use in my osql command. This comand prompt command can run .sql files from a said location. So I save my wrapper sproc in f drive in the above location. This should ideally be a file share on the network somewhere. So the above line of code runs the usp_AddMetadata wrapper sproc on each user database on Server SQL1. Now if I have 40 SQL servers I take this line of code, paste it into Excel (or I could do it in TSQL) with my server list, do some string concatenation and generate 40 lines of code, one for each server. Boom!!

Clearly the above approach was prior to me learning the power of Powershell. The above approach is valid, but a bit clunky. I'll get to re-writing it in Powershell when I get a few spare CPU cycles.

As a side, because I've added the usp_AddMetadata sproc to the model system database, any new databases will have the wrapper sproc added to it automatically.

In the same way that I've created a wrapper for sp_addextendedproperty, I also create a wrapper for sp_updateextendedproperty, with the same logic and constraints.

The above is all very cool and in some ways geeky. The real power, from a management CIO point of view for example, is when I create a view or SSRS report that shows a list of all databases on all servers and who the database owner is. Or all databases that are past their sell by date. Or all databases that have grown to beyond double their initial size. Now that is seriously powerful. I have an enterprise 10000 foot (feet) view of my SQL environment.

Besides the CIO view, from an audit or capacity planning or provisioning point of view, this metadata makes DBA's and sys admins jobs a whole lot easier.




No comments:

Post a Comment