based on previous record how to update record
Moderators: chulett, rschirm, roy
based on previous record how to update record
Sir,
i am new to Datastage, i have a scenario like this please help me by step by step.
col1 col2 date1
111 111 2004-05-05
111 222 2004-06-06
222 555 2004-07-07
222 666 2004-06-06
if col1 is same for 2 records i need latest records.
Thanks
Narasa
i am new to Datastage, i have a scenario like this please help me by step by step.
col1 col2 date1
111 111 2004-05-05
111 222 2004-06-06
222 555 2004-07-07
222 666 2004-06-06
if col1 is same for 2 records i need latest records.
Thanks
Narasa
Hi Nasara,
If you want to persist the last record based on the col1 as a key, simple way would be having a hashfile with 'col1' as a key and write all the records to that hashfile. Finally you will have desired records in the hashfile. later on you can write those records wherever you want.
But taking this approach would depend on how many records you are dealing with and where do you want store them. It is good as long as you are not dealing with humongous volumes.
Kris~
If you want to persist the last record based on the col1 as a key, simple way would be having a hashfile with 'col1' as a key and write all the records to that hashfile. Finally you will have desired records in the hashfile. later on you can write those records wherever you want.
But taking this approach would depend on how many records you are dealing with and where do you want store them. It is good as long as you are not dealing with humongous volumes.
Kris~
Yup, that would be a simple approach that leverages the 'destructive overwrite' of hash files - last one in wins for each combination of the key fields. It would require that your data be sorted properly.
Another option would be to sort the file descending on the key fields and then take the first record from each key group. This is easier than trying to recognise group changes and taking the last one from the previous group. It also solves the "how do I know I'm at the end" dilemma. Stage Variables would be used to store the 'old' values and give you something to compare against.
Don't forget about the Filter option in the Sequential File stage if your source is a flat file. It makes it pretty simple to leverage your operating system to do the sorting 'on the fly' as the data is brought into the job, eliminating an extra 'pre-job' step.
Another option would be to sort the file descending on the key fields and then take the first record from each key group. This is easier than trying to recognise group changes and taking the last one from the previous group. It also solves the "how do I know I'm at the end" dilemma. Stage Variables would be used to store the 'old' values and give you something to compare against.
Don't forget about the Filter option in the Sequential File stage if your source is a flat file. It makes it pretty simple to leverage your operating system to do the sorting 'on the fly' as the data is brought into the job, eliminating an extra 'pre-job' step.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard! :D
Run the input through an Aggregator stage, grouping by Column 1 and specifying Last as the aggregate function for the other columns. If the input is sorted, note on the Input link to the Aggregator stage that this is the case; the job will run much more efficiently.
Run the input through an Aggregator stage, grouping by Column 1 and specifying Last as the aggregate function for the other columns. If the input is sorted, note on the Input link to the Aggregator stage that this is the case; the job will run much more efficiently.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
There's always more than one way to solve problems here.
If you do take this route and 'latest record' means 'max' date or something to that effect, make sure you sort the file first if there is any doubt about the order they will come in. Otherwise you will literally get the last record for that sort group, which may or may not be the 'latest'.
If you do take this route and 'latest record' means 'max' date or something to that effect, make sure you sort the file first if there is any doubt about the order they will come in. Otherwise you will literally get the last record for that sort group, which may or may not be the 'latest'.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Thanks for all,Sainath.Srinivasan wrote:If the processing is only to find the max, you can do a 'sort -ruk' from Unix itself.
My Business logic is like this I have a Customer Ownership table having 60 million rows i need to pull one record per each householdid (There is no.of records per hhid) based on keeping on recent vehicle purchased.
HHID VIN PURCHASEDATE
100 aa 2004-10-10
100 bb 2004-12-10
i need 2nd record in this scenario.
Please help how to do it in datastage.
Thanks a lot for helping guys.
Can yoou guys help on Stage variables and how to use it for storing previous records.
thansk
narasa
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
-
- Premium Member
- Posts: 614
- Joined: Fri Feb 06, 2004 3:59 pm
kollurianu wrote:Hi Sai,
Can u please let me know what did u mean by sharing this work area across would be best for large volumes.
Hello Sirs,
Some households will have 100 records also then how rowproc will work.
Please don't consider there is only 2 records per householdid.
Please give details how to deal with 60 million records.
Thanks
Narasa
Thank you,