Datastage Job Design

urshit_1983 · Post by **urshit_1983** » Tue Feb 27, 2007 9:19 pm

Welcome aboard.

What is the job log like in Director ? Does it start and never stops or it gets executed but none of the rows are passed ?

narasimha · Post by **narasimha** » Tue Feb 27, 2007 9:34 pm

Is there a possibility that you could combine the records from the three tables using a single query, rather than using three seperate ODBC Stages?
If yes, run this query outside datastage and check how long it takes to return rows.

chulett · Post by **chulett** » Tue Feb 27, 2007 9:43 pm

You don't really have 'three source tables'. You have one source table with two ODBC-based reference lookups. That means that for every record in from the first table, a query is executed against each of the other two tables.

As Narasimha notes, if these tables are in the same instance then you should be joining all three tables using one stage. If they're not, you should seriously consider loading the two secondary tables into hashed files. Your ODBC lookups are about the slowest way on the planet to build this job.

Are your lookup queries generated? User defined? Well tuned?

AthiraM · Post by **AthiraM** » Tue Feb 27, 2007 10:38 pm

Hi,

The lookup queries are generated.

I tried using the Hashed File, still no response.

http://img90.imageshack.us/my.php?image=dsjob3tv3.jpg
http://img168.imageshack.us/my.php?image=dsjob4vh6.jpg

As seen in the log should anyting be done with NLS settings.

This job (without the hash files implemented) I tried out running in my local machine where I have both the DS server and DS client installed.
The job though not fast it does the work within 8 hrs time.

The same job when (I tried out both with the hash file and without it implemented) I deployed in the DS server at our client location it shows no response.

Thanks
Athira

chulett · Post by **chulett** » Tue Feb 27, 2007 10:47 pm

Sounds like you may need to tune your first query. Have you checked the explain plan? Gotten help from your DBA on how you can improve the performance of the query? Or to verify that there are no blocking locks when you get 'no response'?

AthiraM · Post by **AthiraM** » Wed Feb 28, 2007 3:25 am

http://img90.imageshack.us/my.php?image=dsjob3tv3.jpg
http://img147.imageshack.us/my.php?image=dstransss0.jpg

The SQL queries used in the 3 stages are as follows :

(a user defined query)
SELECT PACKAGE3.CMS_ORDER.ORDERID, PACKAGE3.CMS_ORDER.NINO, PACKAGE3.CMS_ORDER.ORDERED, PACKAGE3.CMS_ORDER.ORDSTATUS_TYP2, PACKAGE3.CMS_ORDER.MANUSTATUS_TYP1 FROM PACKAGE3.CMS_ORDER where PACKAGE3.CMS_ORDER.ORDERID like '0%' or PACKAGE3.CMS_ORDER.ORDERID like '1%' or PACKAGE3.CMS_ORDER.ORDERID like '2%' or PACKAGE3.CMS_ORDER.ORDERID like '3%' or PACKAGE3.CMS_ORDER.ORDERID like '4%' or PACKAGE3.CMS_ORDER.ORDERID like '5%' or PACKAGE3.CMS_ORDER.ORDERID like '6%' or PACKAGE3.CMS_ORDER.ORDERID like '7%' or PACKAGE3.CMS_ORDER.ORDERID like '8%' or PACKAGE3.CMS_ORDER.ORDERID like '9%' ;

(a generated query)
SELECT PACKAGE3.CMS_CAR.ORDERNO, PACKAGE3.CMS_CAR.GAVEUP, PACKAGE3.CMS_CAR.CHC FROM PACKAGE3.CMS_CAR;

(a generated query)
SELECT PACKAGE3.CMS_MERRIT.NINO, PACKAGE3.CMS_MERRIT.EINNO, PACKAGE3.CMS_MERRIT.ALLOWANCE FROM PACKAGE3.CMS_MERRIT;

I did bring about changes in the query, used the hash file, still the same condition persists.
Please do comment on this.

Thanks
Athira

ArndW · Post by **ArndW** » Wed Feb 28, 2007 4:36 am

Athira,
start with a very simple job, just your data source to a transform and output to a dummy sequential file. Get this working. Then add one reference lookup and get that working (again to the dummy output file). Finally add your last reference, and once that is working to your satisfaction finalize the job to the output.

It might be that your final output is causing issues. Are all 4 tables different ones? Or is the output table the same as one (or more) of the input/reference tables? If so, depending upon your ODBC options, you might be deadlocking yourself.

chulett · Post by **chulett** » Wed Feb 28, 2007 8:13 am

ArndW wrote:It might be that your final output is causing issues. Are all 4 tables different ones? Or is the output table the same as one (or more) of the input/reference tables? If so, depending upon your ODBC options, you might be deadlocking yourself.

Asked and Answered, your Honor. In the other thread on this issue.

ArndW · Post by **ArndW** » Wed Feb 28, 2007 10:31 am

oops, I didn't realize that!

AthiraM · Post by **AthiraM** » Wed Feb 28, 2007 10:02 pm

Hi,

Even though the performance is not laudable, there has been an improvement from the no response status.
When I checked with the Oracle database it was found that there was a lock due to an insert statement on the output table. I did kill some threads manually.

http://img224.imageshack.us/my.php?image=dsjobdr0.jpg
http://img132.imageshack.us/my.php?image=dsjobia8.jpg

Please do suggest on how I can improve the performance which is now very very poor... it took 12 hrs (11.55) to complete this job.

Thanks a lot..
Athira

narasimha · Post by **narasimha** » Wed Feb 28, 2007 10:15 pm

Some points I would like to mention here

- Go for a Ora OCI stage instead of the ODBC stage (There are more options to tune in the Ora OCI stage)
- Break up your job design. Do the population of the hashed files in seperate jobs and use them as lookups in the current job.(This way you can identify the bottleneck easily)
- Tune your main query. Check how long it takes to return rows outside datastage first (So you can get a feel as to how long it is going to take to run)

chulett · Post by **chulett** » Thu Mar 01, 2007 8:25 am

The stage has instructions for cofiguring it in the 'standard' documentation - either the 'Plug-In Installation and Configuration Guide' and the OCI pdf itself.

On UNIX it's a matter of editing the dsenv file, on Windows I *think* you just need to ensure your Registry entries are correct for Oracle. And in either system you need to have an Oracle Client installed on the DataStage Server. Do you?