Design Question

amsh76 · Post by **amsh76** » Sat May 22, 2004 10:31 pm

Can anyone Tell me how to Design a job, where few coluns in the target files need Joins beweent table, few need Sum result and few need the record count for particular where condition in join. can you desin this within one job ?

ogmios · Post by **ogmios** » Sun May 23, 2004 4:45 am

amsh76 wrote:Can anyone Tell me how to Design a job, where few coluns in the target files need Joins beweent table, few need Sum result and few need the record count for particular where condition in join. can you desin this within one job ?

You would need to give some more information what you actually want to achieve, ... For example when you would be working with Oracle you can use Analytic functions to do stuff like you need.

But as in most things the devil is in the details.

Ogmios

chulett · Post by **chulett** » Sun May 23, 2004 6:51 am

As Ogmios pointed out, that's pretty darn vague. Depending on exactly what you meant, I could see two ways to approach this...

1. Do your joins in the driving query or at worst case in a reference lookup inline. Use the Aggregator stage after that to do your 'sum result and record count' work. Check your online help.

2. Do all of the work - joins, sums, counts - in the driving query and then use DataStage to simply shove the results somewhere.

This is all real basic stuff, so if you can and you haven't already, go through the Tutorial that comes with the product. Attending a Basics class would be even better, they are even offered in an online 'at your own pace' format now. Here is a link to their 'Training Options' website, check it out.

amsh76 · Post by **amsh76** » Sun May 23, 2004 4:44 pm

Thanks Craig,

But my problem is the tables on which i m doing join, are different from the tables that I will use to find Sum. So how can i use all these tables in the Driving stage. I need to have some design where I have to use two different DB stages.
Suppose I have 4 tables, where i need records after performin join between A and B...And Join between C and D for 5 columns in the target table. And yeah, all these tables are loading the same table in warehouse.
60% are loaded by table A, 20% by B and 20% by aggregation on join between C and D. I hope i am clear

chulett · Post by **chulett** » Sun May 23, 2004 5:32 pm

Still a little fuzzy, but let's see...

There must be someway to join these four tables together or you are not going to get very far.

If they can't all be joined together at the same time using some kind of complex SQL statement, then it sounds like you may need to do it in two steps: first join A to B in the initial DB stage to get the 80% of your columns you mentioned and then using a lookup to a second DB stage, add the remaining 20% of the fields from the joining of C to D. Somehow you'll need to use a field (or fields) from the first join as 'keys' to constrain the lookup to C & D so you get a useful result.

As to where the aggregation comes in, I'm not sure what you mean by 'aggregation on join between C and D'. Perhaps you can do sums in the lookup SQL when joining C & D so you pull out aggregated fields, or perhaps you need to take the results from both sets of joins and then aggregate certain fields in a later step, I can't really tell. In any case, as long as the data supports it, it should be perfectly do-able either way.

And yes, you can use multiple DB stages in a job. It may not be the most efficient way to do something like this, however. Consider doing the C & D join (and possible sum/aggregation) using a DB stage that builds a Hash file, keyed by the fields you need to join to the A & B result set. The subsequent lookups to the Hash Stage would be much faster and would require only one query back to the source tables instead of one query for every row. All of this can be done in one job.

Hope this all makes sense.

DSXchange

Design Question

Design Question

Re: Design Question