Saturday, 16 April 2011

Database Data Type Usage and Optimizing Data Fields

While designing a Data warehouse or a database an Architect should Data Field spend space like
money, closed handed only spend as much is required.
Following is some guidelines I use to follow –
For String Type Data Types 
1.      Don’t use Unicode data type unless its really required.
2.      If String length is fixed like if code length is fixed and always going to be that way than use Char instead of Varchar.
3.      User Varchar when its variable length data is there like name. but always cap it with some fixed number like for namecase let say varchar(50) should be sufficient.
4.      Avoid use of “MAX”, define a length.

For Integer type Data Type -
Following Ranges for different Integer data types are available in SQL SERVER -

Use wisely, if you are not going to get huge numbers than you don’t need to use big int.
Use Integer for SK of medium size dimension. You can use Tinyint for static
dimension’s Sks.
For Numeric or Decimal Data Types
Try to use Numeric fields.
Avoid using Float data types for financial data type as they are approximate number data
types and round off values. Use of numeric with correct precision and scale
will help here.
For Date and Datetime Data Types
Simple if you don’t need timestamp or don’t have timestamp don’t use date time field,
just use date filed.

Normalization/ De Normalization – When it comes to data warehousing

Now lot of places you will find of normalization definitions some of them are very technical and some seems to go beyond radar make you more confuse –

Here I have given it a try for making it simple –

For me normalization is simple, reduce redundancy, save some storage space and maintain consistency.
Removing redundancy helps OLTP system, as they have lot of UPDATES. Less rows to updates gives offcourse more performance.

De Normalization
Though we don’t have any proper definition of term “De normalization” yet it’s been used frequently.
Let say de normalization will involve creating more attributes (columns) in one table. This will reduce the number join and can give better performance.

Star/Snow Flake and Normalized/De Normalized – What is what??
For me Star is less normalized so I’ll call it De Normalized.
If in real time system you have to maintain a Star Schema you will have to manage big/ huge dimensions.

Snow Flak is a Normalized;
Your decompose relation ships here and create smaller structures. Example – Adventure work DW product dimension.

So Big Q? - What to do when design a DW –
Well I prefer to go with……….wait for it……………Reporting. Yes you heard it.

It’s not about normalization or de normalization. Its about reporting and its performance.
You End user don’t care whether you normalized or de normalized data, they just want to see their reports well with right numbers and with a click on it.

So see what Reports you have in the list and wether normalization or de normalization will benefit it.