Real part of a file in a routine

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
samp
Participant
Posts: 25
Joined: Thu Nov 06, 2003 4:34 am

Real part of a file in a routine

Post by samp »

hi,

i've got a problem.

i have got a table (ORACLE) wich contains the connexion of utilisators to an application (uti_id, connexion_time, deconnexion_time - here are the fields).

i must plot the maximal number of connexion by day, and the interval of time when the maximum is reach.

I would like to do it by datastgae.

my idea :
1 - sort the table by connexion_time (ascending).
2 - select the first line as reference and test the connexion_time of all the other lines of the table. (but it might be too long)

so i am listening all kind of ideas...
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

DataStage is not a reporting tool.

Use the correct tool, which is probably as simple as a query against your Oracle database. Use a CASE construct to form your intervals, and group by these.

DataStage can certainly calculate your intervals as part of an ETL process, but will not generate a report for you. This must be accomplished out of the target table.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
samp
Participant
Posts: 25
Joined: Thu Nov 06, 2003 4:34 am

Post by samp »

I know that datastage is not a reporting tool.

my question was not to construct intervals but finding a way in datastage (using routine or other things) to generate a file wich indicates the number of simultaneous connections .

my first idea was to create a routine and test for each line all the other line of my table verifying if the "deconnection_time" of the tested line is smaller than the "connection_time" of the reference line. Using a variable like 'i=i+1' if my condition is verified.

at the end datastage give me a file with the number of simultaneaous connections in front of each line of my file...

but it appears that it's too long. because my table as 30 000 rows. it's not performing at all.

so i think : perhaps someone as an other idea or could give me an idea to resolve my problem, or has yet find a solution for the same type of problem.

(excuse my english, i'm french)
chucksmith
Premium Member
Premium Member
Posts: 385
Joined: Wed Jun 16, 2004 12:43 pm
Location: Virginia, USA
Contact:

Post by chucksmith »

How are your times measured, (hours, minutes, seconds)? What is the interval used to measure simultaneous connects (all connections existing at this minute of the day)?

My though is you could to load the table to a hash file, adding a new multi-valued (delimited) field, with a value for each connection interval. For example:

Start time: 01:01:01
End time: 01:10:13
Interval length: 1 minute
Delimited Intervals: 01:01 @VM 01:02 @VM 01:03 @VM 01:04 @VM 01:05 @VM 01:06 @VM 01:07 @VM 01:08 @VM 01:09 @VM 01:10

Write a routine to build the Delimited Intervals from the Start time and End time.

Next, reading from the hash file, choose to normalize based upon the new multi-valued field.

Now, you have one row for each connection/interval combination. Just sort and count, or aggregate, them, and you should have your results.

Code: Select all

                          (normalize)
OCI ----> XFR ----> HASH -------------> XFR ----> AGGR ----> SEQ
 
Post Reply