Page 1 of 1

Re-generating sequence numbers within list

Posted: Wed Jun 18, 2008 9:30 pm
by wahi80
Hi,

I have data as follows:

Code: Select all

Jersey City,NJ
Princeton, NJ
Houston,TX
Dallas,TX
LA,CA
Miami,FL
I need to assign a sequence number to each city within the state. Hence my output should look like this:

Code: Select all

1,Jersey City,NJ
2,Princeton, NJ
1,Houston,TX
2,Dallas,TX
1,LA,CA
1,Miami,FL
The sequence should re-start for each state.
I think I need to sort the data first by state, and use create key change column of sort. But how do I re-generate sequence?

Regards
Wah

Posted: Wed Jun 18, 2008 11:40 pm
by Minhajuddin
You can declare a stage variable in a transformer after your sort stage which can be used as a counter.

Code: Select all

Input=====>sort===============>Transformer=========>output
    (create key change      (Use the stageVar given
          on state)               below to generate counts)

Code: Select all

counterVariable==> if not(ip.keyChange) then (counterVariable + 1) else 1

Posted: Thu Jun 19, 2008 12:36 am
by ray.wurlod
This is the correct approach, and requires also that the data are partitioned and sorted by state.

Posted: Thu Jun 19, 2008 8:41 am
by wahi80
ray.wurlod wrote:This is the correct approach, and requires also that the data are partitioned and sorted by state. ...
Hi,
The keyChange from Sort is not being generated properly, I think it is due to some partitioning error. I did the following for first half of the job

Code: Select all

InputSeq------------->Sort Stage------------->OutputSeq
                     (Hash partitioned and             (Sort Merge Collector)   
                        sorted by State)
Is there anything Im missing??

Regards
Wah

Posted: Thu Jun 19, 2008 9:46 am
by vidya_6_2000
I have the same situation, but my file is already sorted, so I cannot use the sort stage. In that case, how do I generate a variable that changes value when the key value changes otherwise, remains the same.

In my server job, I could use RowProcCompareWithPreviousValue routine that came with the tool by IBM itself. There is no such routine for parallel jobs.

Regards,
Vidya Iyer

Posted: Thu Jun 19, 2008 10:23 am
by wahi80
Hi,

There were some spaces in the fields which needed to be trimmed.
The numbers are generated in right order.

Thanks for the help!!

Wah

Posted: Thu Jun 19, 2008 10:54 pm
by ray.wurlod
The Sort stage is perfect to use if the data are already sorted. Specify the sort mode property as "don't sort, already sorted". This prevents DataStage from inserting a tsort operator into the step (= job).

Posted: Fri Jun 20, 2008 7:38 am
by vidya_6_2000
Oh Thank you! That helped! Makes sense.

Appreciate everybody's time and effort.

Regards,
Vidya Iyer