Fact to Fact

santoshkumar · Post by **santoshkumar** » Thu Jan 25, 2007 1:43 pm

Hi Again,

I have a mindset where I am tempted to join two facts.

A fact to fact join.

Am I commiting suicide !!

Thanks.

narasimha · Post by **narasimha** » Thu Jan 25, 2007 1:49 pm

When you say fact, you mean a fact table right?
I don't see any harm in joining two fact tables, if that is what is your requirement is.

DSguru2B · Post by **DSguru2B** » Thu Jan 25, 2007 2:17 pm

santoshkumar wrote: Am I commiting suicide !!

Not if your fact tables are in tera bytes

trobinson · Post by **trobinson** » Thu Jan 25, 2007 2:35 pm

I assume your design is in the classic Kimball model and the compound key to the fact table is, in fact (hah!), the foreign keys to the dimensions. Therefore joining fact tables is really joining by dimension across fact tables - A federated Data Warehouse!
Naturally all facts are at the same level of granularity...

ray.wurlod · Post by **ray.wurlod** » Thu Jan 25, 2007 4:13 pm

Not if your database is Red Brick, which supports efficient fact-to-fact joins, constrained on as many dimensions as you wish, via a multi-table index called a star index.

It is worth adding that this technique - indeed the entire database now known as Red Brick - was originally designed by Ralph Kimball and his colleagues.

Red Brick is now owned by IBM, but I don't think anyone in IBM realizes what a jewel they've got.

kcbland · Post by **kcbland** » Thu Jan 25, 2007 6:48 pm

Heresy! Fact to fact joins, arrrrgghh. Curse ye data modelers. Something when horribly wrong somewhere. If I were to get a tattoo, it just might say STAR SCHEMA.

Often folks mistake and usa a join for a 'merge' or 'union' operation. For example, say you have a bunch of sales order line items in one table, and a bunch of payment line items in another. You're asked to build a result set that tracks order metrics and payment metrics by item. The first thing folks do with SQL is join the two tables.

Consider instead an operation where you "map" sales order line item rows into a row of columns that looks like your final result column set, putting nulls or zeroes in the columns that pertain to payment line items. Do the same for the payment line item rows, and for the sales order related columns you put zeroes and nulls. Now, the union of those two virtual sets of data can be aggregated and grouped, using a MAX derivation for all of the attributes.

You've avoided join operations and all of that nastiness, simply scanned your fact tables with one pass, aggregated your results which deftly handles grain disparaties, and arrived at a result set much faster. You can count by source table rows that contributed to the final row and determine if you have an order with payments, or just payments, and thus filter out that row, which simulates an inner join.

This technique is a great way to handle grain shifting different fact tables and merging into aggregates that move to datamarts.

ray.wurlod · Post by **ray.wurlod** » Thu Jan 25, 2007 7:01 pm

That, too.

trobinson · Post by **trobinson** » Fri Jan 26, 2007 8:05 am

"I'm intrigued by your views and would like to subscribe to your newsletter."

DSguru2B · Post by **DSguru2B** » Fri Jan 26, 2007 8:09 am

Very smart analysis and suggestion by Ken. Two thumbs up, Ken

DSXchange

Fact to Fact

Fact to Fact

Re: Fact to Fact