Job taking more than 7 hours in production

IBMDS4KK · Post by **IBMDS4KK** » Mon Oct 07, 2013 12:07 am

Hi,

I have one job which is taking more than 7 hours in production. Client wants to reduce the run time as it is effecting the business.

Job Design :

Source - Oracle Connector Stage--> Lookup
Reference - Oralce Connector Stage --> Lookup (Sparse)

Lookup o/p --> Filter
Source - Filter --> Lookup
Reference - Oracle Connector Stage --> Lookup (Sparse)

Lookup o/p --> Filter
Source - Filter --> Lookup
Reference - Oracle Connector Stage --> Lookup (Sparse)

Lookup o/p --> Transformer
Transformer --> XML Stage
XML Stage --> Sequential File (Target)

Analysis :

From the source around 57 lac records are fetching and from the reference around 9 crore records are fetching. And further processing records are fetching from the reference nearly crores. Actually they were used sparse lookup. I thought this was wrong and changed to Normal lookup and Join and tried. But same result is coming infact more time is taking than the original job. Later I realized sparse lookup is right in this case.

Note : 4 node configuration file is using in the job but in the oracle stage connectors they did not enable the option 'Enable Partition Reads'. So that the stages are running in sequential mode. I tried to enable that option but i do not have idea on further options inside the 'Enable Partition Reads' option.

Expected :

Want to reduce the production run time. can you suggest any performance tuning techniques? Thanks in advance...

ArndW · Post by **ArndW** » Mon Oct 07, 2013 1:47 am

Sparse lookups are slow compared to a "normal" lookup. They make sense when a job writes to the table which is used for the sparse lookup and thus a 2nd lookup on that record would need to retrieve the new value. Since in this case the output is to XML and not to the table, I would query as to the real need for the sparse lookup and get rid of it. When I translated "lac" and "crore" I get 5.7 million source and 10 million reference records, is this correct?

How many bytes is each lookup record? Are all the fields that you've defined in the job actually used? If not, get rid of unused fields.

While the job is running does the CPU usage go to 100%?

chandra.shekhar@tcs.com · Mon Oct 07, 2013 1:53 am

Lookup stage is normally used when the reference data is small(Normal Lookup) or the source data is small(Sparse Lookup).
In your case neither the source nor the refernce input is small.
Try using Join Stage instead of Lookup and then rerun the job.

@ArndW
5.7 million is the source count and 90 million is the reference count.

priyadarshikunal · Post by **priyadarshikunal** » Mon Oct 07, 2013 4:26 am

chandra.shekhar@tcs.com

If you haven't noticed the OP mentioned trying join. Also Reading 90 million records will create a lot of I/O.

IBMDS4KK,

Please try to monitor the CPU and I/O when the job is running. Also, Is the column Indexed on which you are performing the lookup. Also check when the same was analyzed.

As Arnd mentioned, get rid of unused fields.

chandra.shekhar@tcs.com · Mon Oct 07, 2013 5:20 am

@Priya,
Can you tell me what's an OP ?
And where Kiran has mentioned that he has used Join Stage ?
Because below line appears ambigous to me

Actually they were used sparse lookup. I thought this was wrong and changed to Normal lookup and Join and tried.

Also,

Also Reading 90 million records will create a lot of I/O

So you suggest using Lookup Stage then ?
If you have a source with 5.7 million and another source with 90 million records, Lookup should not be used in any case.
Only Join has to be used which is its speciality of joining huge data.

chulett · Post by **chulett** » Mon Oct 07, 2013 6:30 am

chandra.shekhar@tcs.com wrote:Can you tell me what's an OP ?

Original Poster.

ArndW · Post by **ArndW** » Mon Oct 07, 2013 7:06 am

chandra.shekhar@tcs.com wrote:...If you have a source with 5.7 million and another source with 90 million records, Lookup should not be used in any case.
Only Join has to be used which is its speciality of joining huge data.

I disagree with this statement, I believe that the correct answer here is not a simple cut-and-dried one and that lookups should not be used. If the 90 Million lookup is, for instance, checking existance of an integer key column only then it will be faster, more compact and more efficient to use a lookup rather than a join (and since we have a 2Gb limit with lookup reference data, the 90 million rows used can't be more than 23 bytes/per row).

Since both data streams in a join stage must be sorted (and partitioned) on the join key this might mean, using the OP's example, that the 5.7 million rows might need to be sorted and repartitioned 3 times if changing to joins. This might take longer than doing a (non-sparse) lookup.

What I believe isn't necessary here is the sparse lookup.

priyadarshikunal · Post by **priyadarshikunal** » Mon Oct 07, 2013 8:28 am

chandra.shekhar@tcs.com wrote:So you suggest using Lookup Stage then ?
If you have a source with 5.7 million and another source with 90 million records, Lookup should not be used in any case.
Only Join has to be used which is its speciality of joining huge data.

Not necessarily as Arnd already mentioned. Lookup is much more faster if it fits the memory.

I would not suggest normal lookup if its 90 million with a bigger row size if OP has it. Instead, he should consider 1-2 lookups with less records to be changed to normal lookup get the right balance.

Also he may need to increase/reduce the number of nodes based on CPU utilization.

abhinavagarwal · Post by **abhinavagarwal** » Mon Oct 07, 2013 8:37 am

Hi

If sparse lookup is not needed to fulfill the functional requirements. Then break this job in two parts.

First part should have all the pull from the oracle databases, source references all of them, as they will run on one node only, because we do not have partition tables here.
Write the data in the DS using the key which needed for lookups further. And sort if you really need for any functional requirement. But for doing lookup it will be better to "not to sort".

Second job - using the datasets with same paritioning method for your lookup.
This will allow to use configurations at a better level, rather then the previous design.

If you think that your data will not fit to memory while doing the lookup. Ten sort the data in the first job and then use the join stage in your second job.

Hope this will help!

Cheers.

IBMDS4KK · Post by **IBMDS4KK** » Tue Oct 08, 2013 8:25 pm

Dear All (ArndW, chandra.shekhar@tcs.com, priyadarshikunal, abhinavagarwal)

I had tried what you suggested but still it is taking time. I replaced join stage instead of lookup and applied hash partitioning. What i found from the log file, to fetch records from the oracle connector stage it is taking more time. Configuration file set to 4-node but in the oracle connector stage, 'Enable Partitions Read' is set to 'NO'. Due to this it is running in sequential mode. I want to set the option to 'YES'. There are so many methods like ROWIDRANGE, HASH, ROUND ROBIN, MODULUS, ORACLE PARTITIONS. From the google what i found ROWIDRANGE is the best option, but in the source records are fetching from the View. View is not supporting to ROWIDRANGE and hence job is aborting. Please suggest me.

In the Job, XML OUTPUT stage is there. I am pasting one error related to this stage. And also please help me out in this...

'Invalid character(s) ([xC7]) found converting string (code point(s): <?xml version="1.0" encoding="UTF-8"?><TCRMService xmlns:xsi="http://www.w3.org/...) from codepage UTF-8 to Unicode, substituting'

ArndW · Post by **ArndW** » Wed Oct 09, 2013 2:43 am

You really need to take a step back from solving the problem and need to look into what is causing your problem.

Did you check CPU load while the job was running? Can you remove the XML output stage and replace it with a PEEK stage and see if the performance changes?

Your last post contains a different problem not related to the original post, I suggest you move it to its own thread in order not to confuse issues.