Re-generating sequence numbers within list

wahi80 · Post by **wahi80** » Wed Jun 18, 2008 9:30 pm

Hi,

I have data as follows:

Jersey City,NJ
Princeton, NJ
Houston,TX
Dallas,TX
LA,CA
Miami,FL

I need to assign a sequence number to each city within the state. Hence my output should look like this:

Code: Select all

1,Jersey City,NJ
2,Princeton, NJ
1,Houston,TX
2,Dallas,TX
1,LA,CA
1,Miami,FL

The sequence should re-start for each state.
I think I need to sort the data first by state, and use create key change column of sort. But how do I re-generate sequence?

Regards
Wah

Minhajuddin · Post by **Minhajuddin** » Wed Jun 18, 2008 11:40 pm

You can declare a stage variable in a transformer after your sort stage which can be used as a counter.

Code: Select all

Input=====>sort===============>Transformer=========>output
    (create key change      (Use the stageVar given
          on state)               below to generate counts)

Code: Select all

counterVariable==> if not(ip.keyChange) then (counterVariable + 1) else 1

ray.wurlod · Post by **ray.wurlod** » Thu Jun 19, 2008 12:36 am

This is the correct approach, and requires also that the data are partitioned and sorted by state.

wahi80 · Post by **wahi80** » Thu Jun 19, 2008 8:41 am

ray.wurlod wrote:This is the correct approach, and requires also that the data are partitioned and sorted by state. ...

Hi,
The keyChange from Sort is not being generated properly, I think it is due to some partitioning error. I did the following for first half of the job

Code: Select all

InputSeq------------->Sort Stage------------->OutputSeq
                     (Hash partitioned and             (Sort Merge Collector)   
                        sorted by State)

Is there anything Im missing??

Regards
Wah

vidya_6_2000 · Post by **vidya_6_2000** » Thu Jun 19, 2008 9:46 am

I have the same situation, but my file is already sorted, so I cannot use the sort stage. In that case, how do I generate a variable that changes value when the key value changes otherwise, remains the same.

In my server job, I could use RowProcCompareWithPreviousValue routine that came with the tool by IBM itself. There is no such routine for parallel jobs.

Regards,
Vidya Iyer

wahi80 · Post by **wahi80** » Thu Jun 19, 2008 10:23 am

Hi,

There were some spaces in the fields which needed to be trimmed.
The numbers are generated in right order.

Thanks for the help!!

Wah

ray.wurlod · Post by **ray.wurlod** » Thu Jun 19, 2008 10:54 pm

The Sort stage is perfect to use if the data are already sorted. Specify the sort mode property as "don't sort, already sorted". This prevents DataStage from inserting a tsort operator into the step (= job).

vidya_6_2000 · Post by **vidya_6_2000** » Fri Jun 20, 2008 7:38 am

Oh Thank you! That helped! Makes sense.

Appreciate everybody's time and effort.

Regards,
Vidya Iyer