Real part of a file in a routine

samp · Post by **samp** » Thu Aug 12, 2004 7:51 am

hi,

i've got a problem.

i have got a table (ORACLE) wich contains the connexion of utilisators to an application (uti_id, connexion_time, deconnexion_time - here are the fields).

i must plot the maximal number of connexion by day, and the interval of time when the maximum is reach.

I would like to do it by datastgae.

my idea :
1 - sort the table by connexion_time (ascending).
2 - select the first line as reference and test the connexion_time of all the other lines of the table. (but it might be too long)

so i am listening all kind of ideas...

ray.wurlod · Post by **ray.wurlod** » Thu Aug 12, 2004 9:42 pm

DataStage is not a reporting tool.

Use the correct tool, which is probably as simple as a query against your Oracle database. Use a CASE construct to form your intervals, and group by these.

DataStage can certainly calculate your intervals as part of an ETL process, but will not generate a report for you. This must be accomplished out of the target table.

samp · Post by **samp** » Fri Aug 13, 2004 6:45 am

I know that datastage is not a reporting tool.

my question was not to construct intervals but finding a way in datastage (using routine or other things) to generate a file wich indicates the number of simultaneous connections .

my first idea was to create a routine and test for each line all the other line of my table verifying if the "deconnection_time" of the tested line is smaller than the "connection_time" of the reference line. Using a variable like 'i=i+1' if my condition is verified.

at the end datastage give me a file with the number of simultaneaous connections in front of each line of my file...

but it appears that it's too long. because my table as 30 000 rows. it's not performing at all.

so i think : perhaps someone as an other idea or could give me an idea to resolve my problem, or has yet find a solution for the same type of problem.

(excuse my english, i'm french)

chucksmith · Post by **chucksmith** » Fri Aug 13, 2004 7:24 am

How are your times measured, (hours, minutes, seconds)? What is the interval used to measure simultaneous connects (all connections existing at this minute of the day)?

My though is you could to load the table to a hash file, adding a new multi-valued (delimited) field, with a value for each connection interval. For example:

Start time: 01:01:01
End time: 01:10:13
Interval length: 1 minute
Delimited Intervals: 01:01 @VM 01:02 @VM 01:03 @VM 01:04 @VM 01:05 @VM 01:06 @VM 01:07 @VM 01:08 @VM 01:09 @VM 01:10

Write a routine to build the Delimited Intervals from the Start time and End time.

Next, reading from the hash file, choose to normalize based upon the new multi-valued field.

Now, you have one row for each connection/interval combination. Just sort and count, or aggregate, them, and you should have your results.

Code: Select all

                          (normalize)
OCI ----> XFR ----> HASH -------------> XFR ----> AGGR ----> SEQ