Inserting MaxValue only to a file without using Aggregator

nivas · Post by **nivas** » Fri Jul 23, 2004 4:55 pm

Hi

I am getting a sequece of values From one input column. After completion of job I would like to insert only the max of input column value into a seq file or a table without using aggregator. Could any body please give me suggestion.

thanks
NIvas

ketfos · Post by **ketfos** » Fri Jul 23, 2004 5:15 pm

Hi,
You can put these list of values into a table.
Then select MAX(column name) from table

Ketfos

ray.wurlod · Post by **ray.wurlod** » Fri Jul 23, 2004 5:56 pm

If the source is a text file, pre-process it (use Filter in the Sequential File stage, or ExecSH as a before-stage or before-job subroutine). Sort the file then grab the final line. sort filename | tail -1

nivas · Post by **nivas** » Fri Jul 23, 2004 6:46 pm

Hi

The source is a table. After reading all the input records and finding the max value I would like to insert into table once . I would like to avoid multiple times contacting database. Source has 16 million records . Please suggest.

thanks

Nivas

chulett · Post by **chulett** » Fri Jul 23, 2004 7:23 pm

I'd suggest using a hash file with a fixed value key and a data field for your max value. Use a Stage Variable to keep track of the max value, in essence putting a new value in it only when it is greater than the previous value. Write the stage variable to the hash with a constant key value (like '1' for instance) so there is only one record in it at all times. You could make use of a contraint to only write to the hash when the value changes, if you are worried about writing to it on each row.

After the first part of the job completes, read from the hash and write it to your database.

nivas · Post by **nivas** » Fri Jul 23, 2004 8:07 pm

Hi

My idea is first get a max value in a stage variable. I would like to execute a script which runs after successiful execution of job and inserts the max value into database. I would like to define a job parameter and assign the stage variable value into Job parameter, there by I can pass the Job parameter as parameter to the shell script. My question is, is there any way to assign the stage variable value into Job parameter inside the job? Please suggest.

By the above I can avoid the hash stage. Infuture we can change it to a parallel job. So please suggest me how to assign the stage variable to a job parameter inside the job.

thanks
Nivas

ray.wurlod · Post by **ray.wurlod** » Fri Jul 23, 2004 8:23 pm

Ah, an easy one.

Job parameter values can not be changed once the job has started.

. My question is, is there any way to assign the stage variable value into Job parameter inside the job?

So the answer is no.

nivas · Post by **nivas** » Fri Jul 23, 2004 8:39 pm

Hi

Is there any way by providing a constraint and send the the max value to output link, after end of reading and processing of all input records.
Please suggest.

thanks
Nivas

chulett · Post by **chulett** » Fri Jul 23, 2004 9:19 pm

What about making use of the UserStatus area? Once written to, a following Routine Activity stage (via a Sequencer Job) could automatically take the $UserStatus as an input parameter and use that to write out to a flat file or as input to your script. Or another job could use it to write to your target database.

Or is that too 'Server' of an answer again?

nivas · Post by **nivas** » Fri Jul 23, 2004 9:40 pm

Hi

I did not understand the link between the max value I got in the server job and the $userstatus in the next activity stage. How do I pass the max value I got in the first server job and how do I get that value in the next job through $userstatus.

By the way can we find the end of processing of records inthe server job and we use that in the constrain so that we can tell the datastage to send the value to the output link. Please suggest me.

thanks
Nivas

chulett · Post by **chulett** » Fri Jul 23, 2004 11:05 pm

Hmmm... have you created Sequencer jobs yet? They allow you to control the 'flow' of several related jobs / routines / activities and help automate the passing of parameters between or to them. When you add an Activity stage to a Sequencer, you can automate the passing of parameters from either the Sequencer job itself or the 'output' from any activity that has run before.

So, say for example your 'FirstJob' wrote this max value to its USERSTATUS area. Your 'SecondJob' could have a job parameter called MaxValue defined in it. In the Sequencer, using the 'External Parameter Helper' from the Properties of 'SecondJob', you could set the value of MaxValue to FirstJob.$UserStatus. This would 'automatically' retrieve the current value from the USERSTATUS area of the first job and pass it into the second job.

There have been quite a number of posts here on the subject of USERSTATUS, searching for them should help with your understanding. It is written to using the DSSetUserStatus function and you'll need to 'wrapper' it in a routine to call it from a derivation. I even posted the two lines of code needed a little while back.

On the subject of 'end of processing', there is no indicator or anything available in the engine itself that would let you know you are on the last row in a Server job. You know it was the last row when the job completes.

One nice thing about the User Status area is it works just like the hash game I described earlier, mostly because it is one that each job has associated with it. It follows the 'destructive overwrite' philosophy as well, such that only the last thing you wrote to it will be available when the job ends.

nivas · Post by **nivas** » Sat Jul 24, 2004 8:12 am

Chulett

Thanks for your support and information. I donot have remote access. As you suggested I am going to develop jobs on Monday at office.

thanks
NIvas

chulett · Post by **chulett** » Sat Jul 24, 2004 8:25 am

No problem, glad to help! Let us know if you have more questions when you get into the thick of it.