Simple job - Extract, check dup, write header and trailer

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
newkid
Participant
Posts: 4
Joined: Sat May 01, 2004 12:40 am

Simple job - Extract, check dup, write header and trailer

Post by newkid »

Hi, I am new in DS. I need to finish this real fast otherwise I am fired.

I extracted the data but I need to sorted it with the most current record, then just send out those records.
The header (first record) need to have a hard-coded value 'Header' and a current yyyymm, the trailer (last record) need to have a hard-code
the 'Trailer' and the record countds (total count of this file excluded header and trailer).

I almost done except the sorting out the duplicate and get the
most current one out. What seems to be the cleanest way to get this
done except using "select max(xxx) from xxx where ...group by...." because I have spent 2 day to make this work but it did not. I think
it is the company setup between VMS and DS, it is something the date is not converting correctly. Therefore, all other method except select
max(date) where xxx group by yyy to take care of the duplicate record.

Thanks for all.
Gazelle
Premium Member
Premium Member
Posts: 108
Joined: Mon Nov 24, 2003 11:36 pm
Location: Australia (Melbourne)

Post by Gazelle »

Newkid now, but perhaps not for long :D
Have you looked at using the "Remove Duplicates" and "Sort" stages?
newkid
Participant
Posts: 4
Joined: Sat May 01, 2004 12:40 am

Post by newkid »

How ? Can you give me more details ?

I currently able to put a new column which is putting MAX(x_field) to
elimate the duplicate but not working, it still returns all the rows.

but when I ran it outside DS, it is working. Does anyone know is there
a different between SQL in DS and others.

Thanks.
Gazelle
Premium Member
Premium Member
Posts: 108
Joined: Mon Nov 24, 2003 11:36 pm
Location: Australia (Melbourne)

Post by Gazelle »

Sorry. I missed some vital information about your environment.
I'm using v7.x. I do not know if v6 has a "Remove Duplicates" stage.

Someone else will have to help you on this one.

Have you tried the derivation options of the Aggregator stage?
newkid
Participant
Posts: 4
Joined: Sat May 01, 2004 12:40 am

Post by newkid »

Gazelle wrote:Sorry. I missed some vital information about your environment.
I'm using v7.x. I do not know if v6 has a "Remove Duplicates" stage.

Someone else will have to help you on this one.

Have you tried the derivation options of the Aggregator stage?
No yet. But I think tomorrow, I will try the aggregator with group by
the provider number and max on date. (basically, a lot of provider information field but I only want the most recent record) and that should work. I have a SQL which does that but it is not working in DS.
I think this 2 step should do it.

Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If the meaning of "duplicate" means "same key", write to hashed file then read from hashed file. Updates to hashed files are destructive overwrites, so this is an easy way to remove duplicates.

If you need to record duplicates, this is slightly more complex, in that you have to check the same hashed file to determine whether you've seen this key before. Select "lock for update" in the HF stage that writes to the hashed file. Both HF stages access the same hashed file.

Code: Select all

                      HFStage1
                         |
                         V
            ------>  Transfomer  ----->
                         |
                         V
                      HFStage2
Iin this design, the output link is constrained on the result of the lookup
  • if the lookup failed, this key has not been seen yet
    if the lookup succeeded, this key has already been seen
Use two outputs if needed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply