based on previous record how to update record

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
reddy
Premium Member
Premium Member
Posts: 168
Joined: Tue Dec 07, 2004 12:54 pm

based on previous record how to update record

Post by reddy »

Sir,

i am new to Datastage, i have a scenario like this please help me by step by step.

col1 col2 date1
111 111 2004-05-05
111 222 2004-06-06
222 555 2004-07-07
222 666 2004-06-06

if col1 is same for 2 records i need latest records.

Thanks
Narasa
kris
Participant
Posts: 160
Joined: Tue Dec 09, 2003 2:45 pm
Location: virginia, usa

Post by kris »

Hi Nasara,

If you want to persist the last record based on the col1 as a key, simple way would be having a hashfile with 'col1' as a key and write all the records to that hashfile. Finally you will have desired records in the hashfile. later on you can write those records wherever you want.

But taking this approach would depend on how many records you are dealing with and where do you want store them. It is good as long as you are not dealing with humongous volumes.

Kris~
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yup, that would be a simple approach that leverages the 'destructive overwrite' of hash files - last one in wins for each combination of the key fields. It would require that your data be sorted properly.

Another option would be to sort the file descending on the key fields and then take the first record from each key group. This is easier than trying to recognise group changes and taking the last one from the previous group. It also solves the "how do I know I'm at the end" dilemma. Stage Variables would be used to store the 'old' values and give you something to compare against.

Don't forget about the Filter option in the Sequential File stage if your source is a flat file. It makes it pretty simple to leverage your operating system to do the sorting 'on the fly' as the data is brought into the job, eliminating an extra 'pre-job' step.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ketfos
Participant
Posts: 562
Joined: Mon May 03, 2004 8:58 pm
Location: san francisco
Contact:

Post by ketfos »

Hi,
Can there be situtation in the input file where
column1 is same for more than two records
.ie
111 5111 2004-05-05
111 5111 2004-06-06
111 5111 2004-07-05
111 5111 2004-08-06

If yes, what row will be like to see?

Ketfos
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D
Run the input through an Aggregator stage, grouping by Column 1 and specifying Last as the aggregate function for the other columns. If the input is sorted, note on the Input link to the Aggregator stage that this is the case; the job will run much more efficiently.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

There's always more than one way to solve problems here. :wink:

If you do take this route and 'latest record' means 'max' date or something to that effect, make sure you sort the file first if there is any doubt about the order they will come in. Otherwise you will literally get the last record for that sort group, which may or may not be the 'latest'.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

If the processing is only to find the max, you can do a 'sort -ruk' from Unix itself.
reddy
Premium Member
Premium Member
Posts: 168
Joined: Tue Dec 07, 2004 12:54 pm

Post by reddy »

Sainath.Srinivasan wrote:If the processing is only to find the max, you can do a 'sort -ruk' from Unix itself.
Thanks for all,

My Business logic is like this I have a Customer Ownership table having 60 million rows i need to pull one record per each householdid (There is no.of records per hhid) based on keeping on recent vehicle purchased.

HHID VIN PURCHASEDATE
100 aa 2004-10-10
100 bb 2004-12-10

i need 2nd record in this scenario.

Please help how to do it in datastage.

Thanks a lot for helping guys.

Can yoou guys help on Stage variables and how to use it for storing previous records.

thansk
narasa
martin
Participant
Posts: 67
Joined: Fri Jul 30, 2004 7:19 am
Location: NewJersy

Post by martin »

Hi Reddy,

Try RowProcCompareWithPreviousValue.

All The Best
Martin
martin
Participant
Posts: 67
Joined: Fri Jul 30, 2004 7:19 am
Location: NewJersy

Post by martin »

Define RowProcCompareWithPreviousValue(DSLink3.HHID) = 1 as Transformation Statage CNSTRAINT This will retun 2nd record.
Remeber This Retun 2nd record not Last Record.

Martin
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

As the source volume is 60m rows, sharing this work-load across different areas will be the best.
kollurianu
Premium Member
Premium Member
Posts: 614
Joined: Fri Feb 06, 2004 3:59 pm

Post by kollurianu »

Hi Sai,

Can u please let me know what did u mean by sharing this work area across would be best for large volumes.


Thank you,
reddy
Premium Member
Premium Member
Posts: 168
Joined: Tue Dec 07, 2004 12:54 pm

Post by reddy »

kollurianu wrote:Hi Sai,

Can u please let me know what did u mean by sharing this work area across would be best for large volumes.

Hello Sirs,

Some households will have 100 records also then how rowproc will work.
Please don't consider there is only 2 records per householdid.
Please give details how to deal with 60 million records.
Thanks
Narasa
Thank you,
martin
Participant
Posts: 67
Joined: Fri Jul 30, 2004 7:19 am
Location: NewJersy

Post by martin »

It doest care 100 rows, but RowProcCompareWithPreviousValue = 1 Will Pick 2nd row.I tested with your Sample input, Its picking 2nd row.

Martin
martin
Participant
Posts: 67
Joined: Fri Jul 30, 2004 7:19 am
Location: NewJersy

Post by martin »

Sorry.........This Will Not Pick 1st Row, In 100 rows Scenario with Similar Data This will Omitt 1st Row and retuns all rows from 2nd row(99 Rows).

Martin
Post Reply