Page 1 of 1

Simple job - Extract, check dup, write header and trailer

Posted: Sat May 01, 2004 12:03 pm
by newkid
Hi, I am new in DS. I need to finish this real fast otherwise I am fired.

I extracted the data but I need to sorted it with the most current record, then just send out those records.
The header (first record) need to have a hard-coded value 'Header' and a current yyyymm, the trailer (last record) need to have a hard-code
the 'Trailer' and the record countds (total count of this file excluded header and trailer).

I almost done except the sorting out the duplicate and get the
most current one out. What seems to be the cleanest way to get this
done except using "select max(xxx) from xxx where ...group by...." because I have spent 2 day to make this work but it did not. I think
it is the company setup between VMS and DS, it is something the date is not converting correctly. Therefore, all other method except select
max(date) where xxx group by yyy to take care of the duplicate record.

Thanks for all.

Posted: Mon May 03, 2004 7:01 pm
by Gazelle
Newkid now, but perhaps not for long :D
Have you looked at using the "Remove Duplicates" and "Sort" stages?

Posted: Mon May 03, 2004 10:26 pm
by newkid
How ? Can you give me more details ?

I currently able to put a new column which is putting MAX(x_field) to
elimate the duplicate but not working, it still returns all the rows.

but when I ran it outside DS, it is working. Does anyone know is there
a different between SQL in DS and others.

Thanks.

Posted: Mon May 03, 2004 11:48 pm
by Gazelle
Sorry. I missed some vital information about your environment.
I'm using v7.x. I do not know if v6 has a "Remove Duplicates" stage.

Someone else will have to help you on this one.

Have you tried the derivation options of the Aggregator stage?

Posted: Tue May 04, 2004 12:12 am
by newkid
Gazelle wrote:Sorry. I missed some vital information about your environment.
I'm using v7.x. I do not know if v6 has a "Remove Duplicates" stage.

Someone else will have to help you on this one.

Have you tried the derivation options of the Aggregator stage?
No yet. But I think tomorrow, I will try the aggregator with group by
the provider number and max on date. (basically, a lot of provider information field but I only want the most recent record) and that should work. I have a SQL which does that but it is not working in DS.
I think this 2 step should do it.

Thanks.

Posted: Tue May 04, 2004 12:16 am
by ray.wurlod
If the meaning of "duplicate" means "same key", write to hashed file then read from hashed file. Updates to hashed files are destructive overwrites, so this is an easy way to remove duplicates.

If you need to record duplicates, this is slightly more complex, in that you have to check the same hashed file to determine whether you've seen this key before. Select "lock for update" in the HF stage that writes to the hashed file. Both HF stages access the same hashed file.

Code: Select all

                      HFStage1
                         |
                         V
            ------>  Transfomer  ----->
                         |
                         V
                      HFStage2
Iin this design, the output link is constrained on the result of the lookup
  • if the lookup failed, this key has not been seen yet
    if the lookup succeeded, this key has already been seen
Use two outputs if needed.