Saturday, 2 June 2012

Microstrategy


Components of Microstrategy


There is no need to learn all these components. We cant be masters in all the components. Choose few and gain knowledge. I will take you through the below components.

Desktop
Web
Command Manager
Object Manager
Architect
Administrator

Microstrategy BI Architecture



Microstrategy applications use the MD database to translate user requests into sql queries and to translate the results of those queries back into Microstrategy objects like reports and documents.

Microstrategy MD stores the below types of data.

Application Objects:  Reports and Report related objects data built in Desktop.

Configuration Objects:  Administrative and connectivity related objects built in Microstrategy Architect or in the Microstrategy Desktop Administrator console.

Schema Objects:  Logical objects that relate application objects to physical data warehouse structures. These objects are built exclusively in Microstrategy Architect.

Two Tier Architecture Overflow: Two tier BI architecture comprises the data warehouse, metadata and Desktop. In this type of configuration Desktop directly queries the data warehouse and the metadata without Intelligent Server as intermediate tier. This configuration can be helpful primarily for troubleshooting purposes, small demos and test projects.

Three tier Architecture Overflow: This architecture comprises the data warehouse, metadata, intelligence server and Desktop.

Four tier Architecture Overflow: This architecture comprises the data warehouse, metadata, intelligence server, Desktop and Web.


Note: There is no difference between a report you run in three tier and the same report run in four tier. In both the instances, almost all of the tasks involved in report generation are completed by Intelligence server. Only client(Desktop or Web) can perform the process of formatting data while displaying the report results. Both Desktop and web gets result set from intellegence server.


Normally reports are developed on either Desktop or Web. What is report? What objects are used to develop these reports?

A report is a request for specific, formatted data from the data warehouse. A report consists of a template, a filter and a view.

Attribute, metrics, filters, prompts etc will be used in reports. Hence we should know about these objects first. The object development done under project. This project will be created under Project source(Intelligence server).

Now we got Project under Project source. Will start to know about schema objects first.  Because application objects will be created using schema objects. 


What is Project Source?
Project?


About Desktop

 

Everything we can do through Desktop only like configuring administrator, architect works, creating schema and public objects, scheduling and subscribing and s on.

Install free Microstrategy software available from the official site.  You can observe few projects,  project sources, schema objects(attributes, facts etc), public objects(Metrics, filter, reports etc) and so on.

By using schema and public objects you can create reports. This free software comes with access database, hence you  will get data.

Lets start to know the objects  one by one before start creating own objects.


Schema Objects

Attributes
Facts
Hierarchies
Partition Mappings
Tables
Transformations

Public /Application objects

AutoStyles
Consolidations
Custom Groups               
Documents
Drill Map
Filters
Metrics
Prompts
Reports
Searches

Start with attribute creation. For creating attribute we require columns and tables. Just think we don’t imported any tables to the Desktop. Then how can we create objects.

Before importing tables first we should connect our Microstrategy desktop with database. For connecting any tool with database we need to establish DSN(Data Source Network).
Steps:
            1. Open DSN from your OS. 
            2. Select ADD -> Give any name ->select database -> Finish
            3. Open MSTR Desktop -> Right click on project -> Select Project  configuration
            4. Select Database Instance -> Click New tab -> Give a name for instance
            5. Select New tab -> Give connection name -> Select the DSN you created in 1.
            6. Click OK
Now the connection established between the project and database.
            4. Select Warehouse catalog ->

Reporting Tools


I hope now you got basic idea on BI, the final phase in BI will be the Reporting tools. There are many Reporting tools offered by different vendors are available. Checkout the few below

Business Objects
Microstrategy
Siebel Analytics
Cognos
WebFOCUS
OBI

Many of us will ask one question while choosing Microstrategy, I,e. SQL . I recommend one should know SQL before entering into the MSTR. We don’t show much interest after learning but we surely regret once we start developing reports or dashboards. Few things you should get grip on which will play key role in getting accurate date from desired tables.

Aggregation functions
Joins

Thursday, 17 May 2012

Data Warehouse


Data Warehouse



A Data warehouse is a relational database containing fact and dimension tables.


A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making proces                                                                                                                                                                                                                                                                                                                                                - Bill Inmon

Subject-Oriented: A data warehouse can be used to analyze a particular subject area. For example, "sales" can be a particular subject.

Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.

Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.

Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.

A data warehouse is a copy of transaction data specifically structured for query and analysis.                                                                                                                                                                                                                                                                                                                                                   - Ralph Kimball

We will know how the data warehouse going to build in general

Any database or data warehouse development starts and ends with below steps.

 Requirement gathering - Gathering requirements from business documents, discussion with functional teams, analysts and end user to identify reporting / analysis requirements. What they actually looking for, which data they required for analyzing etc. In this phase they will do the recovery plan for data warehouse in case of disasters.

Data Modeling - This model is the foundation of any data warehouse. By the requirements gathered in first step, conceptual model will be derived. Based on the conceptual model Logical model will be designed. Finally it turns into physical data model.

 e.g., Conceptual Model - Includes all major entities, relationships and does not contain much detailed level of information about attributes.

         Logical Model - A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Includes all required entities, relationships, attributes, key groups that represent business information and define business rules.

       Physical Model - How the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables.



Once the data modeling has been done the next phase is the Schema design. 


Here Schema is nothing but structure of the database. How the table structure should be to fit for the relationships between dimensions and facts tables to facilitate for query performance. This will be constructed by dividing the tables.


Star Schema:

A star schema is the one in which a central fact table is sourrounded by denormalized dimensional tables. A star schema can be simple or complex. A simple star schema consists of one fact table where as a complex star schema have more than one fact table.



Snow Flake Schema:


A snow flake schema is an enhancement of star schema by adding additional dimensions. Snow flake schema are useful when there are low cardinality attributes in the dimensions.


Galaxy/ Integrated/ Hybride Schema:

Galaxy schema contains many fact tables with some common dimensions (conformed dimensions). This schema is a combination of many data marts.



Fact Constellation Schema:

The dimensions in this schema are segregated into independent dimensions based on the levels of hierarchy. For example, if geography has five levels of hierarchy like teritary, region, country, state and city; constellation schema would have five dimensions instead of one.



The schema's are formed by the fact and dimension tables. There are many types of dimension tables are there which you can find in google. Here I am explaining only few.



Dimension Table

Dimension tables are used to describe descriptive information (dimension) of the field of a fact table.  Think dimensions as things or objects. If you take product as example, product should have some dimensions like shape, colour, name etc.
e.g.,  The time dimension would contain every hour, day, week, month, quarter and year that has occurred since you started your business operations.

Conformed Dimension:

Conformed Dimensions are dimensions that has the same values for all areas of the business.

e.g., Suppose you have a “shipping” data mart (telling you what you’ve shipped to whom and when) and a “sales” data mart (telling you who has purchased what and when). Both marts require a “customer” dimension and a “time” dimension. Since April usually means the same thing to every department in the organization.

Junk Dimension


The junk dimension is simply a structure which is used to store un wanted data which is not useful for report generating purpose. the junk dimensions like text or flags, non-generic comments or yes/no or true/false indicators.

e.g., consider a trade company which consists of fact about trading that take places in a share trading firm. In these there may be  some facts like mode of trade(which indicates whether the user  is trading through phone or online)which will be not related to any of the dimensions such as account,date,indices,amount of share etc.
so these unrelated facts are removed from the fact table and stored as a separate dimension as a junk dimension which will be useful to provide extra information.

Degenerated Dimension


As The name interprets , Its not a Dimension. But rather can be Quoted as a primary key of the fact, This key is not a foreign key to any other dimensions rather belongs solely as a primary key to the Fact.

e.g., Let's take example of a Sales Invoices. Each invoice contains the following details
      
Invoice Date
Customer ID
Products within the Invoice
Reference to Order Number(s)
Invoice Number
Invoice Line Numbers (which are multiple lines in single Invoice)
Invoice Line Amount
Invoice Total Amount
When we model the above following Dimensional Modeling standards, we get following distinct Dimensions:
Time Dimension - representing the Invoice Date
Customer Dimension - representing Customer ID
Product Dimension - representing Products within the Invoice
Order Dimension - representing Orders
Invoice Dimension representing Invoice Number & Invoice Line Numbers
We can observe that Only candidate attributes are Invoice Number and Invoice Line Numbers. But, this is at the granularity of the FACT, which stores references to all above said Dimensions as well as the measures i.e. Invoice Line Amount, Invoice Total Amount.  We can degenerate these two into facts.
Def: If a table contains the values, which are neither dimension nor measures is called degenerate dimensions.
                        Ex : invoice number, empno

Slowly Changing Dimensions


 Dimensions that changing over time are known as SCD.  Prices of products, names of people or address may change over time for some reason. These are few examples in SCD.

These are categorized into three types namely SCD1, SCD2 and SCD3.

SCD1: The new record replaces the original record. No trace of the old record exists.In other words no history is kept.

          Ex:
The below one is the Original table

Key
Name
State
1001
Tanuj
A.P

After Tanuj moved from AP to Karnataka, the new information replaces the new record, and we have the following table:

Key
Name
State
1001
Tanuj
Karnataka

Advantages:

- This is the easiest way to handle the SCD problem, since there is no need to keep track of the old information.

Disadvantages:

- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Tanuj lived in AP before.

SCD2: Creating an another additional record. Therefore, both the original and the new record will be present. The newe record gets its own primary key.

The below one is the Original table

Key
Name
State
1001
Tanuj
A.P

After Tanuj moved from AP to Karnataka, the new information added as a new record.

Key
Name
State
1001
Tanuj
A.P
1002
Tanuj
Karnataka

Advantages:

- This allows us to accurately keep all historical information.

Disadvantages:

- This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.

- This necessarily complicates the ETL process


SCD3: Adding two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. 


To accommodate SCD3, we will now have the following columns:

(Customer Key, Name, Original State, Current State, Current State) 

After Tanuj moved from AP to KA, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2010):

Key
Name
Original State
Current State
Current State
1001
1001
A.P
Karnataka
Jan -15 - 2010
 
Advantages:

- This does not increase the size of the table, since new information is updated.                                - This allows us to keep some part of history
.
Disadvantages:

- SCD3 will not be able to keep all history where an attribute is changed more than once. For example, if Tanuj later moves to Kerala on December 15, 2010, the Karnataka information will be lost.




Fact Table
         
Fact tables contain keys to dimension tables as well as measurable facts.
e.g .,If you take same example product, which can have quantity, weight, price etc . All these values are measurable.

Fact table store different types of measures.

Additive facts: Measures that can be added across dimensions.

      e.g.,      
                                     
Date
Store
Product
Sales_amount


The purpose of this table is to record the sales for each product in each store on a daily basis. Sales_Amount is the fact. In this case, Sales_Amount is an additive fact, because you can sum up this fact along any of the three dimensions present in the fact table -- date, store, and product. For example, the sum of Sales_Amount for all 7 days in a week represent the total sales amount for that week.

Semi Additive facts: Measures can be added across few dimensions not with others

     e.g.,


Date
Account
Current_bal
Profit_margin

      Current Balance of single account cannot be added for month. Here current balance will be for the particular day. At end of the each day you can sum up current balance. If you try to sum up current balance for month it will give wrong data, because day by day the balance will change. There will be no meaning, but can be summed up across all the accounts.

Non Additive facts: Measures cannot be added across dimensions
      e.g., Avg, Ratio, variance & Percentage



Hence, the relation between a fact and dimension table is one to many.

Factless Fact Table

       A factless fact table is a fact table that does not have any measures.

e.g.,

      How many students attended a particular class on a particular day?
       
One can easily answer the above questions with the use of factless fact table. In the above question we can model three types of facts. Class ID, Date ID and Student ID.

Wednesday, 16 May 2012

ETL


What is ETL?

ETL is used to migrate data from one database to another, to form data marts and data warehouses and also to convert databases from one format or type to another.

As we know the data coming from different places may be unrelated. So first it should be Extract from all the sources and Tranform into a common format and load it to data warehouse.




ETL is short for extract, transform, load. These three are database functions combined into one tool to pull data out from one DB and place into another.


           Extract is the process of reading the data from different sources like Relational  databases, flat files, IMS etc. Data will be extracted to one area before transforming. Which is called Staging area.


            Transform is the process of converting the extracted data from its previous form into the required  format.  First cleansing will done on source data. Cleansing is nothing but cleaning unwanted data or junk data.

 e.g., Like nulls, zeros, zero length strings, and corrupted rows, apostrophe or a comma or an extra space.


This kind of data will affect reports generated from the data warehouse via OLAP tools. The ultimate goal of data cleansing is to improve the organization's confidence in their data.


Transformation occurs by using certain rules to meet the business and technical needs of the target database.


·         e.g., Selecting only certain columns to load (or selecting null columns not to load).


·         Translating coded values (if the source system stores 1 for male and 2 for       female in one source, A for male, B for female in other source, but the warehouse stores M for male and F for female. Hence this will make common format).


·         11th March 2007 can be stored as '03/11/07' or '11/03/07' among other formats. A data warehousing project would require the different date formats to be transformed to a uniform standard before being entered in the data warehouse




There are many rules in transformation but I mentioned only to as an example. Google if you want to know more.


               Load is the process of writing the data into the target database. The information from data sources are loaded and stored in a form of tables. There are two types of tables in the database structure: fact tables and dimensions tables. Before loading into the tables its very important to find the aggregate values.


Performance improvement will purely depends on the aggregatables. Finding the aggregates and storing into their fact tables is the first step. All related dimension are stored into the dimension tables. Aggregations will help in speeding up the query time in business decisions.


·         e.g., Sales is a numeric value in fact table. Total sales revenue is the aggregate value. Before loading aggregate values should define and move to the tables.

ETL tools available in the market as of now are

Commercial ETL Vendors


  • Informatica
  • Information Server
  • Ab Initio
  • BusinessObjects Data Integrator
  • Oracle Data Integreator
  • Oracle Warehouse Builder
  • Integration Services
  • Sybase ETL
  • SAS ETL
  • Pervasive ETL
  • Adeptia ETL
  • DBSoftLab
Open Source ETL Vendors

  • Pentaho Data Integration
  • Jasper
  • Talend
  • Clover ETL



** If you find this ETL stuff as interesting, then go deep about ETL and chose your ETL tool and be successful.**

Monday, 14 May 2012

About Business Intelligence



Every organization's main aim is to stay ahead in this thriving competitive market place. To become successful one needs right decision making at the right time. For Taking right decisions one need right information. The information plays a key role here, which should be easily accessible, understandable and analyzable at anytime and anywhere. While in the process of business the following questions should be considered. The monitoring, analyzing and planning lies in these questions for any decision

·             What has happened?
  
                 Ex: Performance of business in the past like sales for particular region.

·           What is happening?

·           Why is this happening?

·           What will happen?

·            How to make this happen?


To acheive all this organization needs applications and tools.





§  To provide accurate and current information, organization requires a powerful RDBMS which is nothing but Data warehouse.
§  To gain deeper business insight and facilitate better decision making, reporting tools are required.



Business intelligence (BI) is a collection of applications and tools for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions and meet their requirements.


Architecture of BI




Before learning any ETL or Reporting tool one should know the Architecture of Business Intelligence. The above image will describe how the application and tools are interconnected to form a BI architecture. Let us start knowing briefly about each component before moving to Microstrategy reporting tool.

What is Data Source?


A Data Source is a facility for storing data. These data sources in turn stores organization's data in different forms like flat files, databases, data steam etc. These data sources hold day to day transaction level data. Every organization having different departments and each department may use different kinds of database for day transactions.

EX: Take any Banking sector. There should be many departments like Accounts, Loans, Cards, Investments, Insurance etc. All these departments may use different data sources for tracking the data. Account departments may use oracle DB, Cards department may use flat files, Insurance data coming from DB2 etc. This kind of data usually known as Transactional databases or OLTP(On line Transactional Processing).  All these data sources contribute seemingly unrelated data because all are in disparate formats.



In order to support reporting needs and take management decisions the unrelated data coming from different sources should be consolidated and put into one common place. The common place should be large, strong and robust. Which is called Data Warehouse or OLAP. We will discuss about OLAP next page. Transferring the data coming from different sources into the data warehouse  could be done by ETL.

What is ETL?