Hi,
Somebody designed a job in my project with two transformer Stages are there in the job.
1st transformer they maintained Hash Partitioning and Second Transformer they maintained same partition. Everything else straight move.
Job design is like this :
Ext Src Stg---->Tfm--->Remdup---->OraConnector---->Tfm---->OraConnector
Can anyone suggest me to improve performance in this scenario.
Thanks
hk
Transformer Stage Replacement
Moderators: chulett, rschirm, roy
-
- Charter Member
- Posts: 56
- Joined: Fri Apr 21, 2006 9:53 am
-
- Participant
- Posts: 152
- Joined: Mon Mar 07, 2011 6:16 am
Re: Transformer Stage Replacement
If only partition related issues are handled in the transformer then actually no need to implement transformer.This can be handled in any other stage.
Ex-For remove duplicate stage use a sort before it with hash partition and then make the partition in the rdup stage as same.
Ex-For remove duplicate stage use a sort before it with hash partition and then make the partition in the rdup stage as same.
Soumya
FYI: There is typically little need to actually specific Same partitioning. When partitioning is set to Auto, the engine will normally not repartition unless it determines a need to do so.
If all that the transformers are doing is to copy input columns to the output (no null checking, datatype conversion, column generation, etc.) than they are unnecessary and you can do as soumya has suggested.
What type of data volume is being processed and what do the Oracle connectors in the job do, especially the first one?
Regards,
If all that the transformers are doing is to copy input columns to the output (no null checking, datatype conversion, column generation, etc.) than they are unnecessary and you can do as soumya has suggested.
What type of data volume is being processed and what do the Oracle connectors in the job do, especially the first one?
Regards,
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.
-
- Charter Member
- Posts: 56
- Joined: Fri Apr 21, 2006 9:53 am
10 million rows through transformers is not an issue in your case, so long as you are doing as little in them as you have stated. That leaves the Oracle Connectors, Remove Duplicates and the External Source stage. So, another series of questions as we try to narrow down.
1) How long does this job run when processing 10 million records?
2) What is External Source calling to provide the data?
3) (as asked before) What do the Oracle Connectors do in this job?
4) What degree of parallelism is the job being run at?
5) How wide are the records? (Average bytes per record)
Potential bottlenecks are the the External Source Stage, the Oracle Connectors and the Remove Duplicates stage.
1) Determine how quickly the External Source stage runs by creating a job with just it dumping to a Copy stage: ExtSrc-->Copy. Compile and run to see how long it takes to provide the 10million records.
2) The Remove Duplicates probably has a Sort inserted in front of it at runtime. Depending on the total size of the data being processed, it may have a minor impact.
3) The Oracle connectors: What does each one do (I suspect the first is an Upset of some sort and the second loads rejects)? Do they load to the same table or separate tables? Do they use custom SQL statements? Are the tables being loaded to Indexed, and are the indexes kept up to date?
1) How long does this job run when processing 10 million records?
2) What is External Source calling to provide the data?
3) (as asked before) What do the Oracle Connectors do in this job?
4) What degree of parallelism is the job being run at?
5) How wide are the records? (Average bytes per record)
Potential bottlenecks are the the External Source Stage, the Oracle Connectors and the Remove Duplicates stage.
1) Determine how quickly the External Source stage runs by creating a job with just it dumping to a Copy stage: ExtSrc-->Copy. Compile and run to see how long it takes to provide the 10million records.
2) The Remove Duplicates probably has a Sort inserted in front of it at runtime. Depending on the total size of the data being processed, it may have a minor impact.
3) The Oracle connectors: What does each one do (I suspect the first is an Upset of some sort and the second loads rejects)? Do they load to the same table or separate tables? Do they use custom SQL statements? Are the tables being loaded to Indexed, and are the indexes kept up to date?
- james wiles
All generalizations are false, including this one - Mark Twain.
All generalizations are false, including this one - Mark Twain.