External Filter as Data Source - Add row number

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
debrujr
Participant
Posts: 56
Joined: Fri Jul 31, 2009 1:05 pm
Location: South

External Filter as Data Source - Add row number

Post by debrujr »

I have a requirement to add a row number to input files as they are read in. Maybe better states, I need to know the row number of an input stream but from multiple files read at once. The initial design was a sequential file stage that reads multiple wild carded files and then adds a row number through the built in function. This works for as many files as you have partitions but will eventually append files together if there are too many.

My thought now is to use an awk command(below) in an external filter stage to read the files in and add the column in the beginning and then parse the column back out as usual. When I run it this way I can see that it is recognizing the files and reading the rows but it does not seem to want to write the data to STDIN therefore not presenting the data to DS. It merely sees empty entries. I tried faking it with a grep but still nothing. Am I misunderstanding the capabilities of using the EF stage as a source? Thoughts?

awk '{ print FNR "," $0 }' Input_Directory*

My last resort is to use a script as a pre processor to add the row number. I have written it just in case but I would rather do it all in DS instead of separate code to maintain.
debrujr
Participant
Posts: 56
Joined: Fri Jul 31, 2009 1:05 pm
Location: South

Post by debrujr »

Is this something that can/should not be done? Lack of response worries me...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'm not clear what the problem is using the Row Number Column property with the Sequential File stage. Could you please clarify?

Otherwise, use a sequential-mode Transformer or Column Generator stage downstream of the reading stage to add the row number.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
debrujr
Participant
Posts: 56
Joined: Fri Jul 31, 2009 1:05 pm
Location: South

Post by debrujr »

The requirement is to read multiple files using a wildcard. The sequential file stage forces me to run it in parallel when I am using the multi file wildcard option.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes, so use my second suggestion. Perhaps use Sort-Merge as the collection algorithm, if it makes any sense to do so.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply