How to implement this in DataStage

vij · Post by **vij** » Wed Jan 31, 2007 10:10 pm

Hi all,

I have the following requirement:

I have to query a SQL SERVER DB table and get a maximum value a particular column, say count and add this count column to an exsisting file columns.

the general flow is like this:

table ----> file ----> newfile
(get the maximum (add the column
value of count) count as a column
to the exsiting column in the file)

I would want to know how to implement this in DataStage.

Also is it possible to make this in a single job?

please help me.

Thanks in advance!

DSguru2B · Post by **DSguru2B** » Wed Jan 31, 2007 10:25 pm

Sure you can do this in one job but tell me this, when you get a max, you will get one value. Your file might have x records. Do you want the max from database to be with all the records of your file?

ray.wurlod · Post by **ray.wurlod** » Wed Jan 31, 2007 11:14 pm

To paraphrase, do you want the overhead of selecting the (same) max value for every row you process? If so, you can do it in one job.

A more efficient approach would be to select the max value in a prior job, park it somewhere (for example the prior job's user status area), then feed this value into your main job as a job parameter. That way you only select the max value once, regardless of how many rows your main job processes.

ray.wurlod · Post by **ray.wurlod** » Wed Jan 31, 2007 11:19 pm

Having posted that, I had second thoughts. The Framework is likely to be intelligent enough to do this just once, because the Lookup stage is a composite operator; first it loads the virtual data set, then it performs lookups against the virtual data set. Use Entire partitioning.

Code: Select all

          Select Max(col) From table;
                     |
              (virtual Data Set)
                     |  
                     V
      ------>  Lookup Stage  ------------>

I predict you'll see one row processed on the reference input link.

vij · Post by **vij** » Thu Feb 01, 2007 12:09 am

Thanks for your reply, DSguru2B.

yes, exactly.Thats what i want.

Suppose if the maximum from the table is 35 and I want this value to be populated as a column in the file, for all the records, ie., i want the same value, 35.

additional information - For a single run, only once i will query that table and get the value and then use that same value for all the records in the file.

Actually I know that I can query the table, move that to a file and use a Joiner stage and join both the files, but I wanted to know is there any other way I can do this, which can simplify this job, with minimum use of resources.

DSguru2B · Post by **DSguru2B** » Thu Feb 01, 2007 8:29 am

Ray has explained it all.