When a cube is presented to the end user (thru Excel or Reporting services), I would like to limit the available dimension members to only those which have at least one row in fact table. This is to simplify what user sees as a list of choices under a dimension attribute/hierarchy and not be overwhelmed with all the dimension members for which no data may exist in reality in the fact table. Does Analysis Services 2005 have any easy way to accomplish this as part of UDM/Cube design?
Let me give you an example. Take "Locations" role playing dimension. In ocean transportation business, "Locations" are of different types such as Port of Load (sea-port), Port of Discharge (sea-port), Inland Point of origin, Inland Point of destination etc. Sea-ports are only few (5 to 10) where as inland locations are in thousands. The entire list of locations, which includes locations which may not have had any shipment booked till date, may be even bigger. To keep the ETL simple, we want to keep one central dimension table for all location roles. However, for cube query selection purposes, users want to see a simple and compact list of Port of Load locations for which bookings have been made (i.e., data exists in fact table with those Port Of Load locations) - for example.
One possible solution is... to use one view for each location role in the UDM and setup relationships from fact table column to these individual location views, where the view joins Location dimension table with the fact table to select only those locations that exist in the fact table. Is this a good approach? Will there be any performance impact in Cube processing or end user cube query performance because of using views like this (since view is run every time it is referenced on the fly to generate the result set)?
Another solution is... to have separate dimension tables for each location role with only those members that are used in fact table records. But this complicates the ETL due to lot of redundancy. Will it give any performance advantage since it uses a permanent table versus view in UDM (data source view)?
Please share your insights on the best ways of handling this problem in SQL Server 2005. Thank you!
Hello. The standard behaviour you are looking for is the standard behaviour in a TSQL inner join. So if you make a report in SSRS2005 and use TSQL with your data mart / data warehouse as a source, you will only see connected records in the dimensions and the fact table.
A cube is more like a crossjoin in TSQL. If you crossjoin two tables in TSQL you will get all combinations of members in the tables. This is standard behaviour in SSAS2005.
But within a dimension table in SSAS2005 you have a new feature, only within the dimension, called autoexist, that will only show combinations of members like a TSQL inner join of members. Within a dimension, in the previous version of SSAS (2000), you would get a TSQL-crossjoin of all members within a dimension.
If you combine sea-ports and inland destinations, in the same dimension(in SSAS2005), you will get an inner join of dimension member combinations(autoexists). If you put them as separate dimensions you will miss autoexist.
To reduce empty cells in the grid(in a SSAS2005 cube) you will have to add NonEmpty or Non Empty in MDX in order to not show empty cells. Many clients can help help you with this by having a button to hide empty rows and columns in the grid.
There are many more superior MDX experts, on this forum, that can explain this even further.
HTH
Thomas Ivarsson
|||Hello. Auto exist does not have anything to do with data in the fact table. Auto exist is a dimensional concept only. Auto-exist only pertains to the attributes in the same dimension. If we separate the datasources (either using view or permanent table) for each location role dimension, then with in any single location role dimension, "auto exist" is still effective across the attributes of that dimension. My problem has to do with not wanting to see the dimension members if they (dimension key) are not present in the fact table data. Thanks for the reply. Appreciate it.
|||I have tried to tell you about the behavior of Autoexist if you read my post carefully. You have also an explanation for the dimension and fact problem.
Autoexist is not new to me either.
Regards
Thomas Ivarsson
|||
Shan,
If I understand your requirement correctly, you have one Locations dimension table, but you use that single dimension within your UDM in different roles. In each of its roles, you want the dimension to have a different set of members based on whether or not the members exist within a related fact table or not. Is that correct?
If that is the case, then I believe your best option is to create different views (or named queries in the DSV) on top fo the Locations dimension table that handle the inner join to each related fact table. Then, in the UDM, set up separate dimensions on top of each view and relate each dimension to each fact table/measure group as appropriate.
There will be some performance implications to doing this. For example, you'll be running similar (or even the exact same) queries agains the Locations dimension table as you process each of the separate dimensions in the UDM. So, if you have a lot of them and/or if the joins to the fact tables within the views or named queries are expensive, you'll be adding processing time. Once processing is complete, however, I don't think query performance against any given measure group within the UDM will be any different than if you had a single dimension that was being used as a role-playing dimenision within the cube.
HTH,
Dave Fackler
|||Dave,
That's correct summary of my requirement. We will proceed with this recommended solution. Thank you. I was not sure whether Analysis Services has some built-in support to meet this common need which does not require separating data sources (view or separate table) since that approach takes some additional work! Trying to find an easier or better way since this question applies to many dimension tables in our data warehouse (both simple and role playing dimensions)... :-) Now I understand that there is no such easy way (out of the box support).
If I am right, the added performance time you mentioned impacts the cube processing step time during ETL job run. That's okay.
No comments:
Post a Comment