Global variable

MAT · Post by **MAT** » Mon May 26, 2003 10:52 am

Hi,

I see 4 possibilities:
1.You can declare an environment variable from Administrator and set your jobs to read from this variable.
2.If you are using job sequencers, put your variable on the top sequencer and make your job refer to it.
3.You could use a routine that would always use at run time variables contained in a parameter file.
4. Parameter Manager available from the compagny that runs this forum supposedly handles global variables but I have never tried it.

Hope this helps

MAT

ray.wurlod · Post by **ray.wurlod** » Tue May 27, 2003 4:12 am

Yet another approach is to use a hashed file. Load it up with a key value that is constant (for example "X") and the value you need in another field (column). Use a reference input link (in as many jobs as you please) with a reference key expression consisting solely of the string "X" (or whatever you chose for your key value).
You can have one hashed file with as many of these "global constant" rows as you please.

(And, yes, Parameter Manager does simulate "global" parameters. See www.datastagexchange.com/PM)

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

kduke · Post by **kduke** » Wed May 28, 2003 10:04 pm

Yet another way is to use @USER0 or @USER1 ... etc. The USER# fields are available as long as the session is in effect. You can also use what is called named common. Named common is a Universe BASIC trick. You can look it up on IBM's website. If you have a problem then let me know.

@USER1 = 'my value'

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

ray.wurlod · Post by **ray.wurlod** » Thu May 29, 2003 2:48 am

Two points about Kim's post.

One, there are only five of these variables, @USER0 through @USER4.
There is also @USER.RETURN.CODE.

Second, these are only available within one process. Each process gets a separate set of system variables. Emma's requirement was to have the values available in all jobs called; jobs run in separate processes. Therefore an approach using system variables is at best partial.

trobinson · Post by **trobinson** » Wed Jun 18, 2003 1:51 pm

I want to use some of these user variables in a routine, transform and/or in a stage variable but DataStage won't let me. What's the workaround without resorting to a common variable?

ray.wurlod · Post by **ray.wurlod** » Wed Jun 18, 2003 6:22 pm

DataStage WILL let you use the system variables indicated in all the places you specified.

However, be aware that system variables are per-process; to use them to transfer values they must be in the same process. Typically each Transformer stage in a server job runs in a separate process, which can execute only:
(a) that stage's before-stage subroutine (if any)
(b) any expressions within the stage
(c) any Routines called from expressions within the stage
(d) that stage's after-stage subroutine (if any)

Note also that there are only five system variables of the @USERn style, @USER0 through @USER4. Others, like @USER.RETURN.CODE, are also available.

If you want to make values available to downstream stages, pass them along links in the job design as additional columns. This is almost cost-free.

If you want to implement communication between simultaneously executing jobs, I'd suggest you use the named pipe capability of the sequential file stage.

trobinson · Post by **trobinson** » Thu Jun 19, 2003 12:58 pm

I am using 5.2.1. As an example, I am trying to build the following string within the derivation of a stage varaible;
@USER0:'':time():''.
I'm not worried about spanning processes. I would be quite happy if I were able to assign USER0 a value in one stage variable of a transformer and retreive it in another. I am trying to build an XML string of times for each transformer in a single process job. I am doing this to record performance of each stage. My concerns are. limiting the performance impact of taking performance measurements. If I have to use a common variable, I'll have tpo call a routine in every stage. I'd rather keep appending to a USERn variable in a stage variable.

kduke · Post by **kduke** » Thu Jun 19, 2003 1:36 pm

You have to do it in a routine.

@USER0 := TIME():@FM

Would be best. I would send @INROW as Arg1 then

if Arg1 = 1 then @USER0 := TIME():@FM

@FM will put the first one in @USER0 and @USER and so on.

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

ray.wurlod · Post by **ray.wurlod** » Thu Jun 19, 2003 6:28 pm

You don't need system variables at all, you can just use stage variables. Pass the result along an output link using an additional column. For example, imagine a stage variable called Times in each Transformer stage, and that you evaluate Time() only for the first input row. Times is initialized to a zero length string.
In the first Transformer stage, derivation of Times could be:

If @INROWNUM = 1 Then TIME() Else Times

In subsequent transformer stages, derivation of Times could be:

If @INROWNUM = 1 Then InLink.Times : @FM : TIME() Else Times

There are lots of possibilities and alternatives. I have shown you only one of them.

trobinson · Post by **trobinson** » Fri Jun 20, 2003 11:25 am

There is a need to balance the impact on the design and function of any of our DataStage jobs. I do not want to intermingle the performance functionality with the ETL functionality. The method I am seeking is a single stage variable in every transformer of a DS job. This unobtrusive stage variable could remain in any job or be added/deleted at will. It would be very simple to use and would reside in the same place of every transformer stage. Additionally the derivation of this stage variable will call a routine which will build and, in the final transformer, write to a sequential file. A java app will parse the file and subtract various tranformer stage times to get the time to the millisecond that it takes between any two transformers. Naturally, this will only work in NT and not HP-UX UNIX.

kduke · Post by **kduke** » Fri Jun 20, 2003 1:50 pm

trobinson

I think there are some flaws in your thought process. First of all the first row comes through after the select is done. You lost that time. How do you know when a link finishes? If you are using the start time of the next link then that time will include its time that it took to do the select. The times that are in the DataStage Director when you monitor a job are the best times. What I do is run a process after a job finishes that writes parameter history, row history, job history to six hash files. You run a TCL command like:

DWN_RT_ROW_COUNT.uvb JobName

This program is included with DsWebMon. Ken Bland had a acticle about profiling DataStage processes to get performance trends on jobs. If a job took 6 hours to run then why all of a sudden does it take 12 hours. If I am doing performance tuning then what are your slowest running jobs in rows per second? What are your longest running jobs? I run this program after each job sequence runs after I start to get an idea of what jobs to try to optimize.

I would like to share this code but I may need Ascential's approval because it shows how the internal files of DataStage work. It is very complex at a low level and their routines do not give you the same results. Like:

StartedDateTime = DSGetJobInfo(RunHandle, DSJ.JOBSTARTTIMESTAMP)
iStartedDate = iconv(StartedDateTime[1,10], 'D4-YMD')
iStartedTime = iconv(StartedDateTime[12,8], 'MT')

Plus:

StageNames = DSGetJobInfo(RunHandle, DSJ.STAGELIST)
NoStages = dcount(StageNames, ',')
for i=1 to NoStages
StageName = field(StageNames, ',', i)
* ------------------------------------------------------------------
* Get link names per stage
* ------------------------------------------------------------------
LinkType = DSGetStageInfo(RunHandle, StageName, DSJ.STAGETYPE)
LinkNames = DSGetStageInfo(RunHandle, StageName, DSJ.LINKLIST)
NoLinks = dcount(LinkNames, ',')
for j=1 to NoLinks
LinkName = field(LinkNames, ',', j)
RowCount = DSGetLinkInfo(RunHandle, StageName, LinkName, DSJ.LINKROWCOUNT)
if ElapsedTime>0 and ElapsedTime'' then
RowsPerSec = (RowCount / ElapsedTime)
RowsPerSec = oconv(RowsPerSec, 'MD0')
end

This code will get you closer to your desired result. You need to fill the gaps. Unless someone from Ascential like Ernie says I can show you the real way.

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

trobinson · Post by **trobinson** » Fri Jun 20, 2003 2:40 pm

My jobs are completely realtime not batch. I don't care about volume I care about speed, about milli-seconds to do a single row. If batch is a telescope, realtime is a microscope. Mine is an ODS in realtime baby. My "select" is when a MQ Series message arrives on a queue and the DS job browses for it. I guess I agree that I lose the time it takes for the message to arrive in the queue and the time it takes my job to start processing that message but I think that is a MQ=>network=>DS time and I can live with that. My focus is simply the bottleneck of the actual stages in the job. This ain't batch. It is a completely different mindset. For example, I've done this type of performance testing during our POC. It took .002 seconds to read a message, .5 seconds to parse the XML and .002 seconds to write the message to Oracle. A total of .504 seconds. Sounds good? Nope! The XML Reader stage is a total dog. Now if I were running a gazillion rows through this job, it might appear acceptable. However, I want to populate my ODS in less than one second from the time I "see" the message. Half my processing "window" is taken up with parsing the message. This is valuable information. Director times blow. I need sub-second accuracy. Can the director logs give me that? Does the director logs add other time variability over which I have no control? I also agree that these times are for the stages. Can I get it down to the link level? Maybe but again that is co-mingling the ETL with the performance which I won't do. I have yet to hear anybody discuss realtime performance and DataStage. It is not architected for it. I don't care if DataStage can win the Indianapolis 500 I want it to rule the quarter mile.

chulett · Post by **chulett** » Fri Jun 20, 2003 3:23 pm

Todd - have you checked out the annoucement on Monday of DataStage version 7?

http://www.ascential.com/news/2003/june/06162003.htm

I realize this doesn't help you now, but that will be the first version to incorporate any form of 'real time' processing. Curious what you thought wrt to this new functionality.

-craig

kduke · Post by **kduke** » Fri Jun 20, 2003 7:27 pm

trobinson

Then what I said about metadata is very important. I worked with a guy named Mark Baldridge. He was head of QA for Universe. He had great ideas that directly impact what you are doing. He would take a loop and time it for one function call and write the results into a hash file and print it out like this:

open "FunctionTimes" to FunctionTimesPtr else stop
TestDate="2000-03-01 00:00:00"
TestEndDate="2000-04-01 00:00:00"
Cnt=10000
StartTime=time()
for i=1 to Cnt
* do again for all of these
Mon1=iconv(TestDate, "DM-YMD")
* Mon1=field(TestDate,"-",2)
* Mon1=TestDate[6,2]
Mon2=iconv(TestEndDate, "DM-YMD")
* Mon2=field(TestEndDate,"-",2)
* Mon2=TestEndDate[6,2]
Result=Mon2-Mon1
next i
EndTime=time()
Rec =""
Rec=StartTime
Rec=EndTime
Rec=Cnt
write Rec on FunctionTimesPtr, "iconv"
end

You need to do this in BASIC to get a good comparison. When milliseconds count then do this or hire Mark.

Thanks Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com