DSXchange

Posted: **Wed Apr 27, 2005 3:04 pm**

Sir,

i am new to Datastage, i have a scenario like this please help me by step by step.

col1 col2 date1
111 111 2004-05-05
111 222 2004-06-06
222 555 2004-07-07
222 666 2004-06-06

if col1 is same for 2 records i need latest records.

Thanks
Narasa

Posted: **Wed Apr 27, 2005 4:48 pm**

Hi Nasara,

If you want to persist the last record based on the col1 as a key, simple way would be having a hashfile with 'col1' as a key and write all the records to that hashfile. Finally you will have desired records in the hashfile. later on you can write those records wherever you want.

But taking this approach would depend on how many records you are dealing with and where do you want store them. It is good as long as you are not dealing with humongous volumes.

Kris~

Posted: **Wed Apr 27, 2005 5:01 pm**

Yup, that would be a simple approach that leverages the 'destructive overwrite' of hash files - last one in wins for each combination of the key fields. It would require that your data be sorted properly.

Another option would be to sort the file descending on the key fields and then take the first record from each key group. This is easier than trying to recognise group changes and taking the last one from the previous group. It also solves the "how do I know I'm at the end" dilemma. Stage Variables would be used to store the 'old' values and give you something to compare against.

Don't forget about the Filter option in the Sequential File stage if your source is a flat file. It makes it pretty simple to leverage your operating system to do the sorting 'on the fly' as the data is brought into the job, eliminating an extra 'pre-job' step.

Posted: **Wed Apr 27, 2005 5:19 pm**

Hi,
Can there be situtation in the input file where
column1 is same for more than two records
.ie
111 5111 2004-05-05
111 5111 2004-06-06
111 5111 2004-07-05
111 5111 2004-08-06

If yes, what row will be like to see?

Ketfos

Posted: **Wed Apr 27, 2005 8:30 pm**

Welcome aboard! :D
Run the input through an Aggregator stage, grouping by Column 1 and specifying Last as the aggregate function for the other columns. If the input is sorted, note on the Input link to the Aggregator stage that this is the case; the job will run much more efficiently.

Posted: **Wed Apr 27, 2005 9:35 pm**

There's always more than one way to solve problems here.

If you do take this route and 'latest record' means 'max' date or something to that effect, make sure you sort the file first if there is any doubt about the order they will come in. Otherwise you will literally get the last record for that sort group, which may or may not be the 'latest'.

Posted: **Thu Apr 28, 2005 5:43 am**

If the processing is only to find the max, you can do a 'sort -ruk' from Unix itself.

Posted: **Thu Apr 28, 2005 12:02 pm**

Sainath.Srinivasan wrote:If the processing is only to find the max, you can do a 'sort -ruk' from Unix itself.

Thanks for all,

My Business logic is like this I have a Customer Ownership table having 60 million rows i need to pull one record per each householdid (There is no.of records per hhid) based on keeping on recent vehicle purchased.

HHID VIN PURCHASEDATE
100 aa 2004-10-10
100 bb 2004-12-10

i need 2nd record in this scenario.

Please help how to do it in datastage.

Thanks a lot for helping guys.

Can yoou guys help on Stage variables and how to use it for storing previous records.

thansk
narasa

Posted: **Thu Apr 28, 2005 3:55 pm**

Hi Reddy,

Try RowProcCompareWithPreviousValue.

All The Best
Martin

Posted: **Thu Apr 28, 2005 6:08 pm**

Define RowProcCompareWithPreviousValue(DSLink3.HHID) = 1 as Transformation Statage CNSTRAINT This will retun 2nd record.
Remeber This Retun 2nd record not Last Record.

Martin

Posted: **Fri Apr 29, 2005 3:03 am**

As the source volume is 60m rows, sharing this work-load across different areas will be the best.

Posted: **Fri Apr 29, 2005 4:59 am**

Hi Sai,

Can u please let me know what did u mean by sharing this work area across would be best for large volumes.

Thank you,

Posted: **Fri Apr 29, 2005 8:36 am**

kollurianu wrote:Hi Sai,

Can u please let me know what did u mean by sharing this work area across would be best for large volumes.

Hello Sirs,

Some households will have 100 records also then how rowproc will work.
Please don't consider there is only 2 records per householdid.
Please give details how to deal with 60 million records.
Thanks
Narasa
Thank you,

Posted: **Fri Apr 29, 2005 9:38 am**

It doest care 100 rows, but RowProcCompareWithPreviousValue = 1 Will Pick 2nd row.I tested with your Sample input, Its picking 2nd row.

Martin

Posted: **Fri Apr 29, 2005 9:44 am**

Sorry.........This Will Not Pick 1st Row, In 100 rows Scenario with Similar Data This will Omitt 1st Row and retuns all rows from 2nd row(99 Rows).

Martin

DSXchange

based on previous record how to update record

based on previous record how to update record