BI   Design   Traps     
       
          Chasm and Fan Traps  

 

 

 

Introduction

 

If you’ve been using some other BI reporting tool, such as Business Objects, and are new to OBIEE then you may be wondering how OBIEE handles chasm and fan traps.

 

The “Oracle Business Intelligence Server Administration Guide” does not discuss either of these traps – but then Oracle’s documentation is rather lightweight when it comes to concepts, so this absence should not be considered an omen for good or evil!  In this article, we’ll work through some examples and see what OBIEE does in practice.

 

Before we start, it’s worthwhile asking, “What should a good BI tool do when it comes to design traps?”  The answer is that either it should detect the trap and handle it automatically, or else it should generate an error and not return any data to the end user.  A bad BI tool is one that will return invalid data to the end user – a BI tool where the only line of defence against report corruption is the thoroughness of the QA that follows repository design.

 

As resolving traps involves splitting computations into components, we’d also like a BI tool to generate a single SQL statement to perform the computations and then merge the results together in the database, rather than sending multiple SQL statements to the database and then merging the results in the BI server.  So now that we know what we're looking for let’s get started.

 

 

Chasm Traps

 

Definitions of a chasm trap vary.  We’ll use the definition used by Business Objects.  A chasm trap occurs when two “many to one” joins converge on a single table, the query includes measures from both leaf tables, and multiple rows are returned from the tables in processing the query.  For example, let’s suppose we record sales and orders for customers in two separate leaf tables.  In the database, our foreign-key, master-detail diagram would be as follows:

*
Database Diagram

 

Here’s the SQL to create the tables:

*
Create Table SQL

 

and here’s some sample data:

*
Insert Data SQL

 

If we want to calculate, per customer, the sum of column “quantity_sold” from the “sales” table and the sum of column “order_value” from the “orders” table, then we can use the following queries:

*
Correct Component Queries

 

and we’ll receive the following – correct – results:

 

Customer NameTotal Quantity Sold
Carrington30
De Vere60

 

and:

 

Customer NameTotal Order Value
Carrington300
De Vere600
Sivert300

 

However, if instead we use the following query to aggregate both measures simultaneously:

*
Incorrect Combined Query

 

we get the following – incorrect – results:

 

Customer NameTotal Quantity SoldTotal Order Value
Carrington60600
De Vere1801800

 

The values for “Carrington” are multiplied by a factor of two, those for “De Vere” are multiplied by a factor of three, and those for “Sivert” are nowhere to be seen due to the inner join between the leaf tables.

 

 

How OBIEE handles a Chasm Trap

 

Let’s see how OBIEE will handle this chasm trap.  In the physical layer we’ll create the same relationship between the imported tables as in the database, so that our physical layer diagram looks like this:

*
Physical Layer Diagram

 

We’ll drag our physical tables across to the logical layer to create an identical business model diagram:

*
Business Model Diagram

 

If we define the default aggregation rule for the logical measure columns a ”Sum” and drag the relevant logical columns across to the presentation layer we get the following repository design (we won’t bother creating a hierarchy):

        *
    Completed Repository

 

If we save the repository and run a consistency check we get no errors, so evidently OBIEE doesn’t think the design includes a trap that it can’t handle.

 

If we start up Answers & Dashboards and create a request for the catalog items “Customer Name”, “Quantity Sold”, and “Order Value” we get the following result:

*
Request Results

 

So OBIEE has detected the chasm trap, has worked around it, and has “got its sums right”.

 

But what about the efficiency of the generated SQL.  Has OBIEE generated two SQL statements, one for each component of the request?  If we view the query log we can see that the SQL sent to the database is as follows:

        *
    Generated SQL

 

So OBIEE has performed the query in an efficient manner by sending a single SQL statement to the database, using subquery factorization to calculate the two components independently, and then using an outer join to merge the component rowsets together in the database.

 

 

Fan Traps

 

Definitions of a fan trap vary.  We’ll use the definition used by Business Objects.  A fan trap occurs when two “many to one” joins follow one another in master-detail form, and the query includes a measure from both the leaf table and its immediate master.  For example, let’s suppose we record customers, orders, and order lines in three separate detail tables.  In the database, our foreign-key, master-detail diagram would be as follows:

*
Database Diagram

 

Here’s the SQL to create the tables:

*
Create Table SQL

 

and here’s some sample data:

*
Insert Data SQL

 

If we want to calculate, per customer, the sum of column “order_value” from the “orders” table and the sum of column “order_quantity” from the “order_lines” table, then we can use the following queries:

*
Correct Component Queries

 

and we’ll receive the following – correct – results:

 

Customer NameTotal Order Value
Carrington300
De Vere600
Sivert300

 

and:

 

Customer NameTotal Order Quantity
Carrington60
De Vere90
Sivert60

 

However, if instead we use the following query to aggregate both measures simultaneously:

*
Incorrect Combined Query

 

we get the following – incorrect – results:

 

Customer NameTotal Order ValueTotal Order Quantity
Carrington40060
De Vere70090
Sivert40060

 

The “total order quantity”, corresponding to the leaf measure table, is calculated correctly, but the “total order value”, corresponding to the measure held in its master, is not.

 

 

Default Fan Trap Design

 

Let’s see how OBIEE will handle this fan trap if we try to mimic the database relationship between the tables.  In the physical layer we’ll create the same relationship between the imported tables as in the database so that our physical layer diagram looks like this:

*
Physical Layer Diagram

 

We’ll drag our tables across to the logical layer to create an identical business model diagram:

*
Business Model Diagram

 

If we save the repository and run a consistency check we get no error, so evidently OBIEE doesn’t think the design includes a trap that it can’t handle!

    *
  Completed Repository

 

If we start up Answers & Dashboards and create a request for catalog items “Customer Name”, “Order Value”, and “Order Quantity” we get the following error message:

*
Repository Design Error Revealed

 

Despite OBIEE’s protestations to “Please fix the metadata consistency warnings”, there were no such warnings when we checked the repository.  Clearly the consistency checker needs to be improved!  However, the important point is that OBIEE has returned an error rather than an invalid set of results that might well go unnoticed by end users.

 

 

How OBIEE handles a Fan Trap

 

So why did we get an error with our default fan trap design?  The answer is that the logical layer in the repository requires that measures belong to measure tables, and measure tables exist as leaf nodes in the business model diagram.  If we look at the logical layer:

      *
   Completed Repository

 

we can see that while logical table “Order Lines” is a measure table (yellow), logical table “Orders” is a dimension table (white).

 

So physical table “orders” needs to play two separate roles, (1) as a measure table so that we can aggregate column “order_value”, and (2) as a link table that allows us to navigate from table “customers” to table “order_lines”.  As is the case with many other BI Tools, the solution is to clone table “orders” using a table alias, so that we can reference it separately in these two different contexts (see the notes at the end of this section for other approaches).

 

If we create an alias of table “orders” and call it “order_lines_link”, then we can construct the following physical layer diagram:

*
Physical Layer Diagram

 

We’ll drag our tables across to the logical layer to create an identical business model diagram:

*
Business Model Diagram

 

If we define the default aggregation rule for the logical measure columns as “Sum” and drag the relevant logical columns across to the presentation layer we get the following repository design (again, we won’t bother creating a hierarchy):

      *
   Completed Repository

 

In the logical layer both “Orders” and “Order Lines” are now measure tables, corresponding to leaf nodes in the hierarchy.

 

If we save the repository and run a consistency check we get no errors, so evidently OBIEE doesn’t think that this modified design includes a trap that it can’t handle.

 

If we start up Answers & Dashboards and create a request for catalog items “Customer Name”, “Order Value”, and “Order Quantity” we get the following:

*
Request Results

 

So now that we’ve persuaded OBIEE to do its sums, we find that it’s “got its sums right”.

 

But what about the efficiency of the generated SQL.  Has OBIEE generated two SQL statements for each component of the request?  If we view the log we can see that the SQL sent to the database is as follows:

        *
    Generated SQL

 

So once again OBIEE has performed the query in an efficient manner by sending a single SQL statement to the database.

 

Note that in this example the two measures are independent: “order_value” is not an aggregation of “order_quantity”.  If one was the aggregation of the other, we’d need to set the level of aggregation content so that the most efficient choice of measure could be used depending on the level of drill down within any hierarchy that we might create.

 

Note that we could avoid introducing an alias table and create two business models instead.  However, we would then need to split the two measures between two different catalogs, which would prevent them from being selected in the same query by end users.

 

Note that if the “orders” table also had dimensional attributes that we were interested in, such as column “order_id”, we could hang both the “orders” and “order_lines” tables under the “order_lines_link” alias table – there would be a “one to one” relationship between “orders” and “order_lines_link”:

*
Physical Layer Diagram

 

The “order_lines_link” and “customers” tables would then be a source for a single logical dimensional hierarchy, with “orders” and “order_lines” acting as leaf measure tables.  We could either mimic the physical layer diagram in the logical layer – forming a snowflake schema in the logical layer – or create a single logical dimension table which had as its source the join between tables “customers” and “order_lines_link” – forming a star schema in the logical layer.  The latter organization is preferred by Oracle.  With either of these structures we can drill down using the dimensional attributes in both the “customers” and “order_lines_link” tables to both of the measure tables.  For example, if we created a dimensional hierarchy consisting of “Customer Name” and “Order Id” we could drill-down as follows:

*
Drill-Down Request Results

 

 

Summary

 

So to answer our initial question, OBIEE automatically handles chasm traps using the default database design, but it requires an alias table to be introduced in the physical layer to handle a fan trap.

 

OBIEE satisfies our definition of a good BI tool – at least in these examples: it either handles a trap automatically or raises an error when the design is invalid.  OBIEE does not return invalid data to the end user, so if the repository developer’s QA is less than thorough then disaster will be averted.

 

OBIEE also handles these multi-component queries in an efficient manner by sending a single SQL statement to the database, using subquery factorization to calculate the component rowsets and then merging the rowsets within the database.