Tranformer, help!

ogmios · Post by **ogmios** » Wed Jun 16, 2004 9:49 am

In the simple case where you have a database stage linking to a transformer linking to something else:

The query will be executed in the source database, you have period when the database is "thinking" on it (before it starts returning rows), at that time if you monitor the job the rows/s will be 0.

Then after a time rows start coming from the source database, they are inputted in the transformer which whatever transformations you put in there and then you can write that over any link you want to.

There are 2 big "internal" slowdowns for transformers:
- The amount of data being processed, everything you have transformed needs to go from the database to DataStage and then back to whereever you write it to. So if you have a slow link or extreme volumes (amount of data being processed in bytes, not rows) this will be slow. I had "move" jobs that would run for 5 hours, when replacing it by a UNIX script to do the move it ran 15 minutes (local to the machine, db2load, ...).
- The amount of output links you have from the transformer, everytime you add an output link to a transformer the resulting basic code has to do an extra if statement, sometimes it's faster to duplicate jobs and have them process seperate pieces of data than adding output links.

Ogmios

So if you directly see rows being returned in a monitor although very slow this usually means the amount of data/slow network is the culprit.

besco · Post by **besco** » Wed Jun 16, 2004 9:57 am

I have a DataBase Stage and a sequential File Stage linking to a transfromer. In the transformer, I bind the two fields coming from my two bonds. I started the job at 12:00 and it's always working.

ogmios · Post by **ogmios** » Wed Jun 16, 2004 10:01 am

besco wrote:I have a DataBase Stage and a sequential File Stage linking to a transfromer. In the transformer, I bind the two fields coming from my two bonds. I started the job at 12:00 and it's always working.

ahah, so your database stage is actually a lookup (it uses a striped line). If this is the case you execute your query for every line in the sequential file.

And this then gets written to a single table?

Is this correct?

Ogmios

ogmios · Post by **ogmios** » Wed Jun 16, 2004 10:03 am

Try the following:

Make a job that loads your sequential file in the Oracle database.

Write a query that joins this table with the data your processing now (but for all required customers at the same time), use this query as input and do the rest of your job as in the current case (so you only have a database stage as input).

Currently for every row you're now reading a query is launched against the database, and since you're not binding parameters you cause a lot of hard parsing, ...

Ogmios

besco · Post by **besco** » Wed Jun 16, 2004 10:20 am

After my transformer, I have one sequential File Stage with two links. I feed a file with the lines which match and another with the lines which don't match. So if I understand well, you recommand me to load data in Oracle and do a Join statement. How I manage the lines which do not match ?

ogmios · Post by **ogmios** » Wed Jun 16, 2004 10:38 am

besco wrote:After my transformer, I have one sequential File Stage with two links. I feed a file with the lines which match and another with the lines which don't match. So if I understand well, you recommand me to load data in Oracle and do a Join statement. How I manage the lines which do not match ?

There are several solutions, my favorite solution is to split the job in 2 jobs and once use the input database stage to only select the matching ones and another job to do the non-matching ones.

Alternatively do a left outer join and check on a field with IsNull (using 2 output links).

Ogmios

denzilsyb · Post by **denzilsyb** » Wed Jun 16, 2004 10:49 am

hi besco

I match a sequential File with ORAOCI8 in a transformer. I do a request in ORAOCI8 to get all emails in a table. The problem is : this request takes about one hour to finish.

How about putting the data you are comparing with from the ORACI8 data into a HASH stage before the transform stage, and then use the HASH stage as the reference/lookup to the transform.

The one thing to be wary of here is the number of records in the HASH stage - too many can defeat the object; it will take too long to create the HASH stage file.

dnzl

besco · Post by **besco** » Thu Jun 17, 2004 1:27 am

I also had the idea to charge the data in a Hash file. There is roughly 2 million lines to get. It is possible to load this into a Hash File ?

ogmios, can you explain this :
Alternatively do a left outer join and check on a field with IsNull (using 2 output links).

chucksmith · Post by **chucksmith** » Thu Jun 17, 2004 6:56 am

Use the hash file. Two million lines is nothing. Try something like this:

ORACLE -> Transform -> HASH
'
'
V
SEQUENTAIL ----------> Transform --------------> your output

Try to move just what you need from oracle to the hash file.

besco · Post by **besco** » Thu Jun 17, 2004 8:06 am

It's really better like that. But it takes really long time to get data of Oracle. the configuration of the base and the network do not help us.

chucksmith · Post by **chucksmith** » Thu Jun 17, 2004 10:34 am

Try increasing your array size in the OCI stage, and do not select any columns you do not need. Also, check the enable write cache box in the hash file stage.

ogmios · Post by **ogmios** » Thu Jun 17, 2004 11:49 am

besco wrote:ogmios, can you explain this :
Alternatively do a left outer join and check on a field with IsNull (using 2 output links).

An alternative design if I get what you want to do:

- Instead of loading the e-mails to a hash file, first load the sequential file with your customers to an oracle table. This is a seperate job.

- In your "main" job instead having a sequential file and an oracle stage just use an oracle stage with as main query the select on the new customer table, and then do a left outer join with the query you're now using for the mails. Something as

Code: Select all

select customer.name, email.email
from  customer_table
left outer join email on (joining your customer with the rest of the tables)

Then in the transformer you can select on IsNull(email.email). If the field is NULL no email was found for the customer, else you've got a valid mail.

However this would imply that your customer name would be somewhere stored in a table as a column and not as part of the name of the table (as you now seem to be doing in your other thread.

Ogmios

vmcburney · Post by **vmcburney** » Thu Jun 17, 2004 5:51 pm

In Oracle outer joins have a different (more difficult) syntax, from memory it is:

Code: Select all

select customer.name, email.email
from customer, email
where customer.custid = email.emailid (+)

Where the (+) indicates which side of the WHERE join is the outer or optional side of the join. This will bring in all your customer records while the email field will sometimes be NULL.

chulett · Post by **chulett** » Thu Jun 17, 2004 8:13 pm

You've got it right, Vincent. The plus sign syntax is the older Original Recipe way of doing this in Oracle and goes on the 'outer' side. The new 'left outer join' syntax is the Extra Crispy version.

You can also do 'right outer' and 'full outer' joins.

vmcburney · Post by **vmcburney** » Thu Jun 17, 2004 8:58 pm

I didn't know Oracle 8 supported the "left outer" syntax, I thought they were still using the (+). I also thought that Oracle 8 didn't support the extended ANSI standard where the join statement and join criteria could be included on the same line, instead you had to use the basic ANSI standard where the join clause was seperate to the WHERE clause, making for a much messier SQL statement.

DSXchange

Tranformer, help!

Re: Tranformer, help!