Tranformer, help!
Moderators: chulett, rschirm, roy
Re: Tranformer, help!
In the simple case where you have a database stage linking to a transformer linking to something else:
The query will be executed in the source database, you have period when the database is "thinking" on it (before it starts returning rows), at that time if you monitor the job the rows/s will be 0.
Then after a time rows start coming from the source database, they are inputted in the transformer which whatever transformations you put in there and then you can write that over any link you want to.
There are 2 big "internal" slowdowns for transformers:
- The amount of data being processed, everything you have transformed needs to go from the database to DataStage and then back to whereever you write it to. So if you have a slow link or extreme volumes (amount of data being processed in bytes, not rows) this will be slow. I had "move" jobs that would run for 5 hours, when replacing it by a UNIX script to do the move it ran 15 minutes (local to the machine, db2load, ...).
- The amount of output links you have from the transformer, everytime you add an output link to a transformer the resulting basic code has to do an extra if statement, sometimes it's faster to duplicate jobs and have them process seperate pieces of data than adding output links.
Ogmios
So if you directly see rows being returned in a monitor although very slow this usually means the amount of data/slow network is the culprit.
The query will be executed in the source database, you have period when the database is "thinking" on it (before it starts returning rows), at that time if you monitor the job the rows/s will be 0.
Then after a time rows start coming from the source database, they are inputted in the transformer which whatever transformations you put in there and then you can write that over any link you want to.
There are 2 big "internal" slowdowns for transformers:
- The amount of data being processed, everything you have transformed needs to go from the database to DataStage and then back to whereever you write it to. So if you have a slow link or extreme volumes (amount of data being processed in bytes, not rows) this will be slow. I had "move" jobs that would run for 5 hours, when replacing it by a UNIX script to do the move it ran 15 minutes (local to the machine, db2load, ...).
- The amount of output links you have from the transformer, everytime you add an output link to a transformer the resulting basic code has to do an extra if statement, sometimes it's faster to duplicate jobs and have them process seperate pieces of data than adding output links.
Ogmios
So if you directly see rows being returned in a monitor although very slow this usually means the amount of data/slow network is the culprit.
ahah, so your database stage is actually a lookup (it uses a striped line). If this is the case you execute your query for every line in the sequential file.besco wrote:I have a DataBase Stage and a sequential File Stage linking to a transfromer. In the transformer, I bind the two fields coming from my two bonds. I started the job at 12:00 and it's always working.
And this then gets written to a single table?
Is this correct?
Ogmios
Try the following:
Make a job that loads your sequential file in the Oracle database.
Write a query that joins this table with the data your processing now (but for all required customers at the same time), use this query as input and do the rest of your job as in the current case (so you only have a database stage as input).
Currently for every row you're now reading a query is launched against the database, and since you're not binding parameters you cause a lot of hard parsing, ...
Ogmios
Make a job that loads your sequential file in the Oracle database.
Write a query that joins this table with the data your processing now (but for all required customers at the same time), use this query as input and do the rest of your job as in the current case (so you only have a database stage as input).
Currently for every row you're now reading a query is launched against the database, and since you're not binding parameters you cause a lot of hard parsing, ...
Ogmios
There are several solutions, my favorite solution is to split the job in 2 jobs and once use the input database stage to only select the matching ones and another job to do the non-matching ones.besco wrote:After my transformer, I have one sequential File Stage with two links. I feed a file with the lines which match and another with the lines which don't match. So if I understand well, you recommand me to load data in Oracle and do a Join statement. How I manage the lines which do not match ?
Alternatively do a left outer join and check on a field with IsNull (using 2 output links).
Ogmios
hi besco
The one thing to be wary of here is the number of records in the HASH stage - too many can defeat the object; it will take too long to create the HASH stage file.
dnzl
How about putting the data you are comparing with from the ORACI8 data into a HASH stage before the transform stage, and then use the HASH stage as the reference/lookup to the transform.I match a sequential File with ORAOCI8 in a transformer. I do a request in ORAOCI8 to get all emails in a table. The problem is : this request takes about one hour to finish.
The one thing to be wary of here is the number of records in the HASH stage - too many can defeat the object; it will take too long to create the HASH stage file.
dnzl
dnzl
"what the thinker thinks, the prover proves" - Robert Anton Wilson
"what the thinker thinks, the prover proves" - Robert Anton Wilson
-
- Premium Member
- Posts: 385
- Joined: Wed Jun 16, 2004 12:43 pm
- Location: Virginia, USA
- Contact:
-
- Premium Member
- Posts: 385
- Joined: Wed Jun 16, 2004 12:43 pm
- Location: Virginia, USA
- Contact:
An alternative design if I get what you want to do:besco wrote:ogmios, can you explain this :
Alternatively do a left outer join and check on a field with IsNull (using 2 output links).
- Instead of loading the e-mails to a hash file, first load the sequential file with your customers to an oracle table. This is a seperate job.
- In your "main" job instead having a sequential file and an oracle stage just use an oracle stage with as main query the select on the new customer table, and then do a left outer join with the query you're now using for the mails. Something as
Code: Select all
select customer.name, email.email
from customer_table
left outer join email on (joining your customer with the rest of the tables)
However this would imply that your customer name would be somewhere stored in a table as a column and not as part of the name of the table (as you now seem to be doing in your other thread.
Ogmios
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
In Oracle outer joins have a different (more difficult) syntax, from memory it is:
Where the (+) indicates which side of the WHERE join is the outer or optional side of the join. This will bring in all your customer records while the email field will sometimes be NULL.
Code: Select all
select customer.name, email.email
from customer, email
where customer.custid = email.emailid (+)
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
You've got it right, Vincent. The plus sign syntax is the older Original Recipe way of doing this in Oracle and goes on the 'outer' side. The new 'left outer join' syntax is the Extra Crispy version.
You can also do 'right outer' and 'full outer' joins.
You can also do 'right outer' and 'full outer' joins.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
I didn't know Oracle 8 supported the "left outer" syntax, I thought they were still using the (+). I also thought that Oracle 8 didn't support the extended ANSI standard where the join statement and join criteria could be included on the same line, instead you had to use the basic ANSI standard where the join clause was seperate to the WHERE clause, making for a much messier SQL statement.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn