Logic Issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sriec12
Participant
Posts: 56
Joined: Mon Nov 01, 2010 5:34 pm

Logic Issue

Post by sriec12 »

I want to cross check my concepts with experts :)

1) I am doing lookup with a table which has 1 row but from source I get 100 M records.

Well, According my understanding the lookup stage just passes the sources information and since lookup has one row, there will be no memory issue. For lookup stage there will be 2GB space.

2) I need to get max date and populate 3 keys.

Ex: SSN, Policy ID, Last name , Trns DT

Here I have data in such a way that there will be multiple transactions. But I need to get latest data.

Procedure:

From source I will get around 80 million records:
Step 1: I will sort all the records with sort stage in order SSN, Policy ID, Last name , Trns DT

Step 2: After Sort stage , I use duplicate stage and keys are
SSN, Policy ID, Last name

I tested with small amount of data it works for me.


Is there anything else I should care ? Like Partition and Nodes ?
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Whatever you have told, it looks fine.
Instead of using Sort and Remove Duplicate stage, you can use only Transformer stage to achieve the same.
Thanx and Regards,
ETL User
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The data would still need to be sorted for the Transformer logic to work correctly.

What role is the lookup playing in all this? I'm guessing it is providing some kind of static data for the run but you didn't clarify. And the real test would be a full load, have you done that yet?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you're doing a lookup against a single row table you're effectively returning a constant. You could generate this in the Transformer stage or in a Column Generator stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sriec12
Participant
Posts: 56
Joined: Mon Nov 01, 2010 5:34 pm

Post by sriec12 »

1) Regarding Lookup Stage its not static value.

Its a weekly load, I need to load the as of date for every week. In this case

is it fine to do Lookup for one row against 80M records ?

I apologize for the confusion
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

sriec12 wrote:1) Regarding Lookup Stage its not static value. Its a weekly load, I need to load the as of date for every week.
So as I said, static over the course of the run. Sure that's "fine" as it is cached into and read from memory but why not simply pass that in as a job parameter?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sriec12
Participant
Posts: 56
Joined: Mon Nov 01, 2010 5:34 pm

Post by sriec12 »

@chulett --- So is it fine to use lookup 1 row aginast 80M records

Well, Its a great point......................It can be created as job parameter.

Reason for why its not parameterized

1) This is a fact job, very few jobs need this as of date.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Odd reason in my book, we have a ton of "single job" parameters. We use them for data needed during a run that changes on a daily basis without regard to how many jobs may (or may not) need each one.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply