Job taking more than 7 hours in production
Moderators: chulett, rschirm, roy
Job taking more than 7 hours in production
Hi,
I have one job which is taking more than 7 hours in production. Client wants to reduce the run time as it is effecting the business.
Job Design :
Source - Oracle Connector Stage--> Lookup
Reference - Oralce Connector Stage --> Lookup (Sparse)
Lookup o/p --> Filter
Source - Filter --> Lookup
Reference - Oracle Connector Stage --> Lookup (Sparse)
Lookup o/p --> Filter
Source - Filter --> Lookup
Reference - Oracle Connector Stage --> Lookup (Sparse)
Lookup o/p --> Transformer
Transformer --> XML Stage
XML Stage --> Sequential File (Target)
Analysis :
From the source around 57 lac records are fetching and from the reference around 9 crore records are fetching. And further processing records are fetching from the reference nearly crores. Actually they were used sparse lookup. I thought this was wrong and changed to Normal lookup and Join and tried. But same result is coming infact more time is taking than the original job. Later I realized sparse lookup is right in this case.
Note : 4 node configuration file is using in the job but in the oracle stage connectors they did not enable the option 'Enable Partition Reads'. So that the stages are running in sequential mode. I tried to enable that option but i do not have idea on further options inside the 'Enable Partition Reads' option.
Expected :
Want to reduce the production run time. can you suggest any performance tuning techniques? Thanks in advance...
I have one job which is taking more than 7 hours in production. Client wants to reduce the run time as it is effecting the business.
Job Design :
Source - Oracle Connector Stage--> Lookup
Reference - Oralce Connector Stage --> Lookup (Sparse)
Lookup o/p --> Filter
Source - Filter --> Lookup
Reference - Oracle Connector Stage --> Lookup (Sparse)
Lookup o/p --> Filter
Source - Filter --> Lookup
Reference - Oracle Connector Stage --> Lookup (Sparse)
Lookup o/p --> Transformer
Transformer --> XML Stage
XML Stage --> Sequential File (Target)
Analysis :
From the source around 57 lac records are fetching and from the reference around 9 crore records are fetching. And further processing records are fetching from the reference nearly crores. Actually they were used sparse lookup. I thought this was wrong and changed to Normal lookup and Join and tried. But same result is coming infact more time is taking than the original job. Later I realized sparse lookup is right in this case.
Note : 4 node configuration file is using in the job but in the oracle stage connectors they did not enable the option 'Enable Partition Reads'. So that the stages are running in sequential mode. I tried to enable that option but i do not have idea on further options inside the 'Enable Partition Reads' option.
Expected :
Want to reduce the production run time. can you suggest any performance tuning techniques? Thanks in advance...
Kiran Kumar
...ALL IS WELL...
...ALL IS WELL...
Sparse lookups are slow compared to a "normal" lookup. They make sense when a job writes to the table which is used for the sparse lookup and thus a 2nd lookup on that record would need to retrieve the new value. Since in this case the output is to XML and not to the table, I would query as to the real need for the sparse lookup and get rid of it. When I translated "lac" and "crore" I get 5.7 million source and 10 million reference records, is this correct?
How many bytes is each lookup record? Are all the fields that you've defined in the job actually used? If not, get rid of unused fields.
While the job is running does the CPU usage go to 100%?
How many bytes is each lookup record? Are all the fields that you've defined in the job actually used? If not, get rid of unused fields.
While the job is running does the CPU usage go to 100%?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 353
- Joined: Mon Jan 17, 2011 5:03 am
- Location: Mumbai, India
Lookup stage is normally used when the reference data is small(Normal Lookup) or the source data is small(Sparse Lookup).
In your case neither the source nor the refernce input is small.
Try using Join Stage instead of Lookup and then rerun the job.
@ArndW
5.7 million is the source count and 90 million is the reference count.
In your case neither the source nor the refernce input is small.
Try using Join Stage instead of Lookup and then rerun the job.
@ArndW
5.7 million is the source count and 90 million is the reference count.
Thanx and Regards,
ETL User
ETL User
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
chandra.shekhar@tcs.com
If you haven't noticed the OP mentioned trying join. Also Reading 90 million records will create a lot of I/O.
IBMDS4KK,
Please try to monitor the CPU and I/O when the job is running. Also, Is the column Indexed on which you are performing the lookup. Also check when the same was analyzed.
As Arnd mentioned, get rid of unused fields.
If you haven't noticed the OP mentioned trying join. Also Reading 90 million records will create a lot of I/O.
IBMDS4KK,
Please try to monitor the CPU and I/O when the job is running. Also, Is the column Indexed on which you are performing the lookup. Also check when the same was analyzed.
As Arnd mentioned, get rid of unused fields.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Premium Member
- Posts: 353
- Joined: Mon Jan 17, 2011 5:03 am
- Location: Mumbai, India
@Priya,
Can you tell me what's an OP ?
And where Kiran has mentioned that he has used Join Stage ?
Because below line appears ambigous to me
If you have a source with 5.7 million and another source with 90 million records, Lookup should not be used in any case.
Only Join has to be used which is its speciality of joining huge data.
Can you tell me what's an OP ?
And where Kiran has mentioned that he has used Join Stage ?
Because below line appears ambigous to me
Also,Actually they were used sparse lookup. I thought this was wrong and changed to Normal lookup and Join and tried.
So you suggest using Lookup Stage then ?Also Reading 90 million records will create a lot of I/O
If you have a source with 5.7 million and another source with 90 million records, Lookup should not be used in any case.
Only Join has to be used which is its speciality of joining huge data.
Thanx and Regards,
ETL User
ETL User
I disagree with this statement, I believe that the correct answer here is not a simple cut-and-dried one and that lookups should not be used. If the 90 Million lookup is, for instance, checking existance of an integer key column only then it will be faster, more compact and more efficient to use a lookup rather than a join (and since we have a 2Gb limit with lookup reference data, the 90 million rows used can't be more than 23 bytes/per row).chandra.shekhar@tcs.com wrote:...If you have a source with 5.7 million and another source with 90 million records, Lookup should not be used in any case.
Only Join has to be used which is its speciality of joining huge data.
Since both data streams in a join stage must be sorted (and partitioned) on the join key this might mean, using the OP's example, that the 5.7 million rows might need to be sorted and repartitioned 3 times if changing to joins. This might take longer than doing a (non-sparse) lookup.
What I believe isn't necessary here is the sparse lookup.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Not necessarily as Arnd already mentioned. Lookup is much more faster if it fits the memory.chandra.shekhar@tcs.com wrote:So you suggest using Lookup Stage then ?
If you have a source with 5.7 million and another source with 90 million records, Lookup should not be used in any case.
Only Join has to be used which is its speciality of joining huge data.
I would not suggest normal lookup if its 90 million with a bigger row size if OP has it. Instead, he should consider 1-2 lookups with less records to be changed to normal lookup get the right balance.
Also he may need to increase/reduce the number of nodes based on CPU utilization.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Participant
- Posts: 26
- Joined: Thu Jun 19, 2008 12:39 am
- Location: Atlanta
Hi
If sparse lookup is not needed to fulfill the functional requirements. Then break this job in two parts.
First part should have all the pull from the oracle databases, source references all of them, as they will run on one node only, because we do not have partition tables here.
Write the data in the DS using the key which needed for lookups further. And sort if you really need for any functional requirement. But for doing lookup it will be better to "not to sort".
Second job - using the datasets with same paritioning method for your lookup.
This will allow to use configurations at a better level, rather then the previous design.
If you think that your data will not fit to memory while doing the lookup. Ten sort the data in the first job and then use the join stage in your second job.
Hope this will help!
Cheers.
If sparse lookup is not needed to fulfill the functional requirements. Then break this job in two parts.
First part should have all the pull from the oracle databases, source references all of them, as they will run on one node only, because we do not have partition tables here.
Write the data in the DS using the key which needed for lookups further. And sort if you really need for any functional requirement. But for doing lookup it will be better to "not to sort".
Second job - using the datasets with same paritioning method for your lookup.
This will allow to use configurations at a better level, rather then the previous design.
If you think that your data will not fit to memory while doing the lookup. Ten sort the data in the first job and then use the join stage in your second job.
Hope this will help!
Cheers.
- Thanks and Regards,
Abhinav Agarwal
Abhinav Agarwal
Dear All (ArndW, chandra.shekhar@tcs.com, priyadarshikunal, abhinavagarwal)
I had tried what you suggested but still it is taking time. I replaced join stage instead of lookup and applied hash partitioning. What i found from the log file, to fetch records from the oracle connector stage it is taking more time. Configuration file set to 4-node but in the oracle connector stage, 'Enable Partitions Read' is set to 'NO'. Due to this it is running in sequential mode. I want to set the option to 'YES'. There are so many methods like ROWIDRANGE, HASH, ROUND ROBIN, MODULUS, ORACLE PARTITIONS. From the google what i found ROWIDRANGE is the best option, but in the source records are fetching from the View. View is not supporting to ROWIDRANGE and hence job is aborting. Please suggest me.
In the Job, XML OUTPUT stage is there. I am pasting one error related to this stage. And also please help me out in this...
'Invalid character(s) ([xC7]) found converting string (code point(s): <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns:xsi="http://www.w3.org/...) from codepage UTF-8 to Unicode, substituting'
I had tried what you suggested but still it is taking time. I replaced join stage instead of lookup and applied hash partitioning. What i found from the log file, to fetch records from the oracle connector stage it is taking more time. Configuration file set to 4-node but in the oracle connector stage, 'Enable Partitions Read' is set to 'NO'. Due to this it is running in sequential mode. I want to set the option to 'YES'. There are so many methods like ROWIDRANGE, HASH, ROUND ROBIN, MODULUS, ORACLE PARTITIONS. From the google what i found ROWIDRANGE is the best option, but in the source records are fetching from the View. View is not supporting to ROWIDRANGE and hence job is aborting. Please suggest me.
In the Job, XML OUTPUT stage is there. I am pasting one error related to this stage. And also please help me out in this...
'Invalid character(s) ([xC7]) found converting string (code point(s): <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns:xsi="http://www.w3.org/...) from codepage UTF-8 to Unicode, substituting'
Kiran Kumar
...ALL IS WELL...
...ALL IS WELL...
You really need to take a step back from solving the problem and need to look into what is causing your problem.
Did you check CPU load while the job was running? Can you remove the XML output stage and replace it with a PEEK stage and see if the performance changes?
Your last post contains a different problem not related to the original post, I suggest you move it to its own thread in order not to confuse issues.
Did you check CPU load while the job was running? Can you remove the XML output stage and replace it with a PEEK stage and see if the performance changes?
Your last post contains a different problem not related to the original post, I suggest you move it to its own thread in order not to confuse issues.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>