Thursday 30 July 2015

SSIS : Lookup Vs Merge Join


In many interviews you will be seeing interview asking what is difference between merge join and lookup component.
Most of the people think Merge and lookup are same, well if that was the case than Microsoft would have not created two separate component.

Lookup and Merge join are two different activities they perform in different way and are meant of different purposes.  

Area
Merge Join
Lookup
Work function
Joins two data set based on one or more key columns. This joining is just like T SQL join (actually a hash join)
It reference data for a one and only one possible match of the lookup table (or query)
Input
  • Input needs to be sorted
  • Need 2 input source
  • Input need not to be sorted
  • Need one input
Output
Only one output with merging (joining) of two input data sets
Two outputs Matched and non matched
On Match
Matched rows will flow into output pipeline. All matching rows of both the source will be flown

Match rows will into Matched output pipeline.
Here if multiple matching rows are there in lookup table than only first matched row from lookup table will be selected.
On Non Match
It gets define by merge join settings-
  • Inner  - In case of inner join, non matched rows will be discarded
  • Left – In cases on left join, non matched rows of left of sources (first input) will be considered and other source non matched rows will be discarded.
  • Right - In cases on right join, non matched rows of right of sources (second input) will be considered and other source non matched rows will be discarded.
  • Full – both source non matched rows will be considered in output
Non match setting is available, when enabled-

  • Only non match rows of source will be considered and will be flown in output.
  • Lookup tables’s non matched out put is will not be considered at all.


Null Values
You have setting to specify if NULL to treated as equal. This will treat NULL value equal to other NULL value.  Else it will discard the row as non matched one.
Null values in source will match to null values in lookup if both tables has null, else of only source has null it will return an error. Null in lookup table will not have any impact.
Usage
When two data set needs to be joined (left, right, full)
When data needs to referenced like get PK from master tables based on name.  
Specific requisite
Input needed to be sorted
NA

I hope above comparison will help you understand the difference between SSIS Lookup and Merge join.

Monday 20 July 2015

Normalization/ De Normalization in data warehousing


Now lot of places you will find of normalization definitions some of them are very technical and some seems to go beyond radar make you more confuse –
Here I have given it a try for making it simple –

Normalization 
For me normalization is simple, reduce redundancy, save some storage space and maintain consistency.
Removing redundancy helps OLTP system, as they have lot of UPDATES. Less rows to updates gives offcourse more performance.

De Normalization 
Though we don’t have any proper definition of term “De normalization” yet it’s been used frequently.
Let say de normalization will involve creating more attributes (columns) in one table. This will reduce the number join and can give better performance.

Star/Snow Flake and Normalized/De Normalized – What is what??

For me Star is less normalized so I’ll call it De Normalized.
If in real time system you have to maintain a Star Schema you will have to manage big/ huge dimensions.

Snow Flake is a Normalized;
You decompose relation ships here and create smaller structures. Example – Adventure work DW Product dimension.


So Big Question? - What to do when design for DWH –

Well I prefer to go with……….wait for it……………Reporting. Yes you heard it.
It’s not about normalization or de normalization. It’s about reporting and its performance.
Your End users don’t care whether you normalized or de normalized data, they just want to see their reports well with right numbers and fastest way possible.  In today’s time very handful of organization/customer/user will care for storage/space, all they want now is quickest reporting or analytics.


There are no fix rules of going Normalized will be better or vice a versa. Hence it is always based on your reporting and analytic requirement.  

Wednesday 15 July 2015

Uing Custom assembly in SSAS


Sometimes default function and features of SSAS are not sufficient to fulfil the requirement. Therefore, Analysis Services lets you add assemblies to an Analysis Services instance or database. Assemblies let you create external, user-defined functions using any common language runtime (CLR) language, such as Microsoft Visual Basic .NET or Microsoft Visual C#. You can also use Component Object Model (COM) Automation languages such as Microsoft Visual Basic or Microsoft Visual C++.


Following is an example where we have used an assembly to create a custom drillthrough.

Requirement
Requirement is to have huge text column (BLOB) to be displayed when doing a drillthrough in SSAS using excel as tool.

Problem statement
  1. By default column which included into drillthrough needs to be present in cube. If we don’t have columns in cube we can not see then in drill through.
  2. Having huge text column into the cube will increase the cube storage as well will degrade the performance of cube. Also it will increase cube processing time.
 If we keep the required drillthrough columns in cube, our cube performance will be impacted, if don’t keep it we wont be able to drilldown on it usnign default drillthrough functionality.

Solution
Using an assembly
We come up with a solution where we will not keep the required BLOB columns into the cube, but will use an assembly to query background SQL DB and send a result back into excel.
  
  • A dot net based assembly having function to which will call database stored procedure or T SQL. This function will take MDX syntax based parameter as input.  

  • To make a call to this assembly use action tab of cube designer and use following kind of syntax –

Call MyAssembly.MyClass.MyFunction(a, b, c)

  • This function can internally (within assembly) call multiple functions.
  • Function will have a connection string to SQL DB. 
  • In Action tab select the return type as dataset. This will allow you to display return result in a new excel sheet.
  • This assembly and its function can be used in multiple ways, instead of returning a result set it can also return a web page URL or SSRS report URL. 
I have used assembly created by "https://asstoredprocedures.codeplex.com/" and used it as basic template. i modified it for my purpose and use.


Deploy
This assembly can be deployed using SQL Server Management Studio.

SSMS -> Expand Assemblies folder -> Right click -> Click on New Assembly -> it opens Register Server Assembly window -> choose type and through File Name add your assembly

Security
Whenever we use custom assembly we also faces security challenges.
By default SSAS provide a security options as follows -

                  
Permission Setting
Description
Safe
Provides internal computation permission. This permission bucket does not assign permissions to access any of the protected resources in the .NET Framework. This is the default permission bucket for an assembly if none is specified with the PermissionSet property.
ExternalAccess
Provides the same access as the Safe setting, with the additional ability to access external system resources. This permission bucket does not offer security guarantees (although it is possible to secure this scenario), but it does give reliability guarantees.
Unsafe
Provides no restrictions. No security or reliability guarantees can be made for managed code running under this permission set. Any permission, even a custom permission included by the administrator, is granted to code running at this level of trust.

Conclusion 
Using custom assembly is very easy. It lets you customized your cube/MDX in multiple ways. 

References