Wednesday, December 19, 2007

Management Studio and Excel

Maybe I am easily amused - but I wish I had found out about this feature sooner. SSMS has an option setting that enables copying and pasting of column headers from grid results. This comes in nifty when copying and pasting data to Excel - saving me some typing time.

Wednesday, November 21, 2007

What an Amazing Run!

Today is a special day for me - it is my consulting team’s last day on our current project.  I have been working on this OI (operation intelligence) project for more than 15 months.  I do not recall being on a lengthier consulting engagement anywhere else. 

The project has indeed come a long way - several personnel changes, three reorganizations, and a complete management change-over.  I remember we started out with the inheritance of one DW.  Since then we have built and implemented a few more, and acquired a dozen more as a result of our more recent restructure.  (Yes, we operate as many disparate DW's as the systems we source from.)  Despite the increase in scope, we have largely maintained about the same level of resources this whole time.  I guess we are managing it somehow.

Nonetheless, it has been an rewarding challenge and a great learning experience.  I know what I am going to miss the most is the smart and driven people I work with. 

Monday, November 19, 2007

Files in RSTempFiles Directory

Nothing beats running into a production issue starting the work week. One of our SSRS servers encountered an assembly loading error early this morning – making the reports unavailable to our users. The error was caused by the fact that the disk it is running on did not have enough space.

It happened that the volume of the content in the Program Files\Microsoft SQL Server\MSSQL.#\Reporting Services\RSTempFiles directory has grown over time and ultimately filled up the disk. Unlike SSRS logs (in the LogFiles directory), these temporary snapshots are not being cleaned up automatically. The "CleanupCycleMinutes" setting in RSReportServer.config appears to work only with storage in database – not file system.

In the interest of getting SSRS back online ASAP, we manually deleted the older snapshot files - reclaiming a decent chunk of space. Going forward, we will need to put in place a script or some sort to automatically and regularly keep the directory clean. Sounds like an action item.

Tuesday, November 13, 2007

SSIS Raw File Reader

Several months ago while I was working a solution to move data across network domains, I came across this nifty utility called “Raw File Reader.” This tool allows me to read and view the data contained in raw files produced by SSIS. It is currently a freeware and more information can be found at http://sqlblogcasts.com/blogs/simons/archive/2007/01/11/SSIS-Rawfile-viewer---now-available.aspx .

Thursday, October 11, 2007

64K Limit

The Problem

I can’t believe I am still talking about SQL Server 2000 when SQL Server 2008 is coming around the corner. But here it is.

At my current project, my team and I operate a portfolio of DWBI solutions implemented on SQL Server 2000 and 2005 platforms. A couple days ago, the ETL processing of one of our older DWs (i.e. SQL Server 2000) failed out of the blue. Upon brief investigation, I determined that the failure was caused by a dimension table containing 64,400 plus member records. In Analysis Services 2000, a dimension member can’t have more than 64,000 children. This applies to the “All” member as well. Anything more, as in our case, causes the processing of the dimension to fail.

The Solution

To get around this limitation, we considered few options and their viabilities. Bear in mind that the dimension in question is a degenerated dimension derived from one of the facts in the DW.

  1. Reduce the total number of members going into the dimension in the cube.
  2. Reduce the total number of members going into the dimension in both the relational DW and the cube.
  3. Create a dimension hierarchy to reduce the number of children per parent – with no changes to the total number of members.

In this particular case our customers and I settled on option 2. We purged all of the fact records older than a cut-off date thereby trimming down the number of members going into the dimension. (We effectively condensed the volume of history available in the DW.)

I must admit we took the quick and dirty approach. Option 3 suggested a more elegant and longer-term solution however it is also much more involved. Furthermore the nature of the data does not allow us to build a dimension hierarchy with meaningful or other logical groupings (Rolodex-style breakdown). As far as option 1 is concerned, since we preferred the data modifications to be consistent in both relational DW and OLAP tiers, the option is out.

The Lesson

I do feel that running into this limitation forces our customers and my development team to re-evaluate the business needs, growth requirements and usability of the DWBI solution. A few thousand children per parent already present a usability problem – that’s a lot of rows for users to drill down into a dimension. I simply can’t imagine having 64,000 items to select from in a drop-down list. Moreover some client tools may not be able to handle the volume gracefully.

Saturday, September 29, 2007

The Source is Ready

On a number of occasions, our ETL applications needed to determine whether a SQL Server database has come back online after being restored from a backup. Such requirement arises when the systems we are sourcing from create copy of their data and restore it onto a separate server dedicated to all downstream consumption. These source systems are often mission-critical or LOB applications that restrict direct ETL operations - fearing performance impact on user queries.

A quick and simple way to find out the availability of a SQL Server database is to use the DATABASEPROPERTYEX function. This T-SQL function returns the current setting of the specified database option or property for the specified database. The SQL is listed below.

The WHILE loop checks to see if the "status" of the "SourceDB" database is online and available for query, if not, it waits for a minute then checks again.

To maximize reusability and uniformity of our ETL implementation, we wraps this piece of code in its own separate SSIS or DTS package. The package in turn can be executed as first step of a job or within another package.

Wednesday, September 12, 2007

SSIS Configuration File for Manageability

Using SSIS configuration file (.dtsConfig) to manage environment variables, such as server connections and source file paths, can contribute greatly to portability and manageability of an ETL application that comprised of SSIS packages.

We utilize configuration files extensively in our DWBI implementations to alleviate the complexity of promoting builds (DEV, TEST, PROD), server migration and deployment to multiple ETL servers. SSIS solutions can be readily deployed to different environments with simple changes to these XML-based configuration files without having to modify a single package.

For instance, one of our DWBI implementations employs 2 separate servers to "load balance" the processing of flat files (web logs) made available on a central file server. The same SSIS solution is deployed on these 2 ETL/staging servers (SQL Server, SQL Server Agent, Integration Services). But the solution instances are driven by slightly different configuration file specifications. Below is the over-simplified data flow.

The configuration file on each server specified the same source file server, destination DW server, and other control variables with the exception of one flag to indicate whether to process odd- or even- hour source files and another to indicate which one of the 2 target ETL servers to stage the extracted data.

Thursday, September 6, 2007

Please.

Sunday, August 12, 2007

SSAS Database Synchronization for Scalability & Availability

Database synchronization can be leveraged as a convenient solution to promote scalability and availability for the OLAP tier in a DWBI implementation.  It allows for a scale-out strategy by deploying copy of the SSAS database to multiple servers to load-balance user queries.  It increases availability by reducing the duration of time the SSAS database is offline or inaccessible due to processing.  In many scenarios, synchronization takes less time to perform than processing.

Traditionally, in Analysis Services 2000, synchronization implies backing up and restoring the SSAS database.  Nonetheless I had mixed success of getting msmdarch.exe to work properly with larger .cab files and across servers.  Analysis Services 2005 simplifies the task of synchronizing SSAS databases by making available the Synchronize XMLA command and the Synchronize Database Wizard which is a UI that ultimately creates and issues the Synchronize XMLA command.  Of course, in 2005, we can still back up and restore the .abf file.  But the synchronization capability is much more straight-forward.  

Several of our DWBI implementations take advantage of this synchronization capability Analysis Services 2005 has to offer.  Our solution performs processing of the cubes on a staging SSAS server and deploys (or sync's) the cubes to the presentation SSAS servers dedicated for user queries.  The diagram below illustrates the processing-flow of our solution.

Synchronization to the production SSAS servers is achieved by the means of the Synchronize XMLA command.  A DDL script in the form of XML specifies the command and all of its parameter elements (i.e. Source database, compression option) in an .xmla file.  (XMLA provides a declarative approach to specifying requirement and effectively eliminates the need of writing more complicated code - consequently enhancing the maintainability of the solution.) 

The Synchronize XMLA command is to be issued on the target SSAS server as opposed to the source SSAS server.  It can be manually executed in SSMS.  Or it can be run programmatically as in our solution using the Analysis Services Execute DDL task in a SSIS package. 

In order for the Synchronize command to work successfully, make sure that the Windows account under which the target SSAS server runs has administrative rights to the source database.  The target server executes the Synchronize command under its service account.

Sunday, July 29, 2007

Let's Get Started

Welcome
Hi Everyone. Welcome to my first post on this newly launched blog of mine. I'll be updating this blog regularly with my real-life day-to-day experience working in the vast and exciting world of data warehousing and business intelligence (DWBI).

As an IT professional, I have worn many different hats over the years – salesman, manager, architect, developer, analyst, support and user. Though DWBI hasn't been the only thing I’ve done, it is certainly one of the most challenging and exciting expertise IMHO.

By no means will I try to come across as all-knowing DWBI guru. Because I am not. (And I have not met one yet.) Nonetheless I am definitely not new to the field. Here I am simply hoping to share some of my experience – good ones and bad ones. I might rant a little bit too.

Why the blog?
I literally woke up one day about a month ago and said, “I should start a journal like this. It would be fun. I think this would be a great medium to share and learn.”

I still remember hating writing as a kid. The only time I’ve actually written a journal was when I was backpacking in Europe right after I graduated from college. The experience was of such great immensity that I had trouble putting it on paper.