Timestamp sort issue

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Timestamp sort issue

Post by prasson_ibm »

Hi All,

I have job design where i need to sort the data based on timestamp column(microsecond) and after sort apply transormation logic in transformer which should pick the max timestamp and populate for rest of records.
Stage veriable value is:-

Code: Select all

If srt_to_tfm.LastChgDateTime>= svEFFDT then srt_to_tfm.LastChgDateTime else svEFFDT
where svEFFDT is initialzed with '2009-07-11 13:17:39.810' value.

Instead of getting max timestamp for all input records,i am getting wrong output.

It seems timestamp column is sorted on each partition and stage veriable is picking that partitions max timestamp value and for next partition some other value,but i want max timestamp value for all input records.

This job is working filr on single node.
Kindly help me with some solution to this issue.

Thanks
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

The results you are seeing is exactly how the product works. In order to obtain the maximum value across ALL records, you have two options:

1) Run the transformer itself in sequential mode

2) Split out the timestamps into a separate stream, process them sequentially to get the maximum value (for instance, use an aggregator stage running in sequential mode) and then join the result back to the main data.

Depending upon your data quantity, either option can noticeably impact performance. However, you have already experienced the fact that to get the correct result, the data you're capturing must be processed in a single partition.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
Thanks for reply.

I am trying to implement solution2,will keep you updated.


Thanks
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,

I am planning to use aggregator stage.

I want to take saperate stream,create one dummy column 1 and aggregate data to take max timestamp value.
In this case i ll hash partition on dummy column,so do i need to run aggregator in sequence mode..??
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

You would essentially be performing the same processing as if you just ran the aggregator in sequential mode, except that you are adding additional, unnecessary overhead to your job by adding the dummy column and re-partitioning. Also, this wastes system resources by having multiple instances of the aggregator (it's running in parallel) while only one of them does any work.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Data in the job will be lesser,so better i ll go for option 1.

As suggested i am making transformer to run sequentially and sort stage will remain run in parallel mode.

But do you think that due to partition in the sort we could get wrong sorted sequence in transformer(running in sequence)..??
Post Reply