How to implement this in DataStage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vij
Participant
Posts: 131
Joined: Fri Nov 17, 2006 12:43 am

How to implement this in DataStage

Post by vij »

Hi all,

I have the following requirement:

I have to query a SQL SERVER DB table and get a maximum value a particular column, say count and add this count column to an exsisting file columns.

the general flow is like this:

table ----> file ----> newfile
(get the maximum (add the column
value of count) count as a column
to the exsiting column in the file)

I would want to know how to implement this in DataStage.

Also is it possible to make this in a single job?

please help me.

Thanks in advance!
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Sure you can do this in one job but tell me this, when you get a max, you will get one value. Your file might have x records. Do you want the max from database to be with all the records of your file?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

To paraphrase, do you want the overhead of selecting the (same) max value for every row you process? If so, you can do it in one job.

A more efficient approach would be to select the max value in a prior job, park it somewhere (for example the prior job's user status area), then feed this value into your main job as a job parameter. That way you only select the max value once, regardless of how many rows your main job processes.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Having posted that, I had second thoughts. The Framework is likely to be intelligent enough to do this just once, because the Lookup stage is a composite operator; first it loads the virtual data set, then it performs lookups against the virtual data set. Use Entire partitioning.

Code: Select all

          Select Max(col) From table;
                     |
              (virtual Data Set)
                     |  
                     V
      ------>  Lookup Stage  ------------>
I predict you'll see one row processed on the reference input link.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vij
Participant
Posts: 131
Joined: Fri Nov 17, 2006 12:43 am

Post by vij »

Thanks for your reply, DSguru2B.

yes, exactly.Thats what i want.

Suppose if the maximum from the table is 35 and I want this value to be populated as a column in the file, for all the records, ie., i want the same value, 35.

additional information - For a single run, only once i will query that table and get the value and then use that same value for all the records in the file.

Actually I know that I can query the table, move that to a file and use a Joiner stage and join both the files, but I wanted to know is there any other way I can do this, which can simplify this job, with minimum use of resources.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Ray has explained it all.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply