I want to cross check my concepts with experts
1) I am doing lookup with a table which has 1 row but from source I get 100 M records.
Well, According my understanding the lookup stage just passes the sources information and since lookup has one row, there will be no memory issue. For lookup stage there will be 2GB space.
2) I need to get max date and populate 3 keys.
Ex: SSN, Policy ID, Last name , Trns DT
Here I have data in such a way that there will be multiple transactions. But I need to get latest data.
Procedure:
From source I will get around 80 million records:
Step 1: I will sort all the records with sort stage in order SSN, Policy ID, Last name , Trns DT
Step 2: After Sort stage , I use duplicate stage and keys are
SSN, Policy ID, Last name
I tested with small amount of data it works for me.
Is there anything else I should care ? Like Partition and Nodes ?
Logic Issue
Moderators: chulett, rschirm, roy
-
- Premium Member
- Posts: 353
- Joined: Mon Jan 17, 2011 5:03 am
- Location: Mumbai, India
The data would still need to be sorted for the Transformer logic to work correctly.
What role is the lookup playing in all this? I'm guessing it is providing some kind of static data for the run but you didn't clarify. And the real test would be a full load, have you done that yet?
What role is the lookup playing in all this? I'm guessing it is providing some kind of static data for the run but you didn't clarify. And the real test would be a full load, have you done that yet?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
If you're doing a lookup against a single row table you're effectively returning a constant. You could generate this in the Transformer stage or in a Column Generator stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
So as I said, static over the course of the run. Sure that's "fine" as it is cached into and read from memory but why not simply pass that in as a job parameter?sriec12 wrote:1) Regarding Lookup Stage its not static value. Its a weekly load, I need to load the as of date for every week.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers