Re-generating sequence numbers within list

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
wahi80
Participant
Posts: 214
Joined: Thu Feb 07, 2008 4:37 pm

Re-generating sequence numbers within list

Post by wahi80 »

Hi,

I have data as follows:

Code: Select all

Jersey City,NJ
Princeton, NJ
Houston,TX
Dallas,TX
LA,CA
Miami,FL
I need to assign a sequence number to each city within the state. Hence my output should look like this:

Code: Select all

1,Jersey City,NJ
2,Princeton, NJ
1,Houston,TX
2,Dallas,TX
1,LA,CA
1,Miami,FL
The sequence should re-start for each state.
I think I need to sort the data first by state, and use create key change column of sort. But how do I re-generate sequence?

Regards
Wah
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

You can declare a stage variable in a transformer after your sort stage which can be used as a counter.

Code: Select all

Input=====>sort===============>Transformer=========>output
    (create key change      (Use the stageVar given
          on state)               below to generate counts)

Code: Select all

counterVariable==> if not(ip.keyChange) then (counterVariable + 1) else 1
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is the correct approach, and requires also that the data are partitioned and sorted by state.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
wahi80
Participant
Posts: 214
Joined: Thu Feb 07, 2008 4:37 pm

Post by wahi80 »

ray.wurlod wrote:This is the correct approach, and requires also that the data are partitioned and sorted by state. ...
Hi,
The keyChange from Sort is not being generated properly, I think it is due to some partitioning error. I did the following for first half of the job

Code: Select all

InputSeq------------->Sort Stage------------->OutputSeq
                     (Hash partitioned and             (Sort Merge Collector)   
                        sorted by State)
Is there anything Im missing??

Regards
Wah
vidya_6_2000
Participant
Posts: 10
Joined: Wed Apr 16, 2008 7:39 am
Location: USA

Post by vidya_6_2000 »

I have the same situation, but my file is already sorted, so I cannot use the sort stage. In that case, how do I generate a variable that changes value when the key value changes otherwise, remains the same.

In my server job, I could use RowProcCompareWithPreviousValue routine that came with the tool by IBM itself. There is no such routine for parallel jobs.

Regards,
Vidya Iyer
wahi80
Participant
Posts: 214
Joined: Thu Feb 07, 2008 4:37 pm

Post by wahi80 »

Hi,

There were some spaces in the fields which needed to be trimmed.
The numbers are generated in right order.

Thanks for the help!!

Wah
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The Sort stage is perfect to use if the data are already sorted. Specify the sort mode property as "don't sort, already sorted". This prevents DataStage from inserting a tsort operator into the step (= job).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vidya_6_2000
Participant
Posts: 10
Joined: Wed Apr 16, 2008 7:39 am
Location: USA

Post by vidya_6_2000 »

Oh Thank you! That helped! Makes sense.

Appreciate everybody's time and effort.

Regards,
Vidya Iyer
Post Reply