Replacing Tables with a Stage in Datastage

shank · Post by **shank** » Sun Nov 11, 2012 3:45 pm

Hello All,

I am pretty new to Datastage 8.5 and we are working on Sybase database. we are experiencing performance issues since we deal with so much of extraction queries in every jobs. I am wondering is there a way we can use a stage in Datastage as a table and refer to that stage everytime to do all Database operations (Insert, update , delete, select) to avoid lot of I/O with Sybase .

Let me explain it further.

We have a table A which has 10 columns.
there are some History tables - Table B and Table C which act as references for our data processing.

In the first job, we will populate incoming data in to Table A (only 4 columns we will populate)

Based on these 4 column values, we will lookup Table B and populate 5th and 6th column in Table A.

After that based on 4,5 and 6th column in Table A, we will lookup with Refernce Table C and populate the remaining columns in Table A.

Because of this, we are doing lot ot I/O and the process is very slow even though we have tuned our sybase queris.

My question here is , is there any way we can make these tables A, B and C into a stage in Datastage so that we only deal within Datastage server. We don't need to hit the database everytime.

Any thoughts?

chulett · Post by **chulett** » Sun Nov 11, 2012 4:07 pm

You said "in the first job"... does that imply you've broken this processing up into multiple jobs? I'm not seeing anything that tells me it can't all happen in a single job where you do the lookups before populating the target thus only inserting into "Table A" once.

Not sure what you mean by "replacing tables with a stage" as the stage is the table but perhaps you meant joining them... and if they are all in the same database there's nothing stopping you from joining all three in your source sql and skipping the lookups altogether.

shank · Post by **shank** » Sun Nov 11, 2012 4:45 pm

only after deriving column 5 .. we can refer to the other history table.
Its like .. if Column 5 is 'x' then, refer to HIstory table A, other wise refer to History table B for populating the rest of the columns in the table.
Its not pretty straight forward as we do joins in the queries.. fine.. let me put it in this way ...

let us take a dataset stage as example..

Can I treat a dataset(or anyother stage) as I am treating a sybase table? Can I perform all operations (select, min(), max(), update, delete etc) on that stage as I am doing in a sybase table? I would basically avoid hitting database and deal with only datastage for all my processing. Can I do that with any stage ?

chulett · Post by **chulett** » Sun Nov 11, 2012 5:01 pm

shank wrote:only after deriving column 5 .. we can refer to the other history table. Its like .. if Column 5 is 'x' then, refer to HIstory table A, other wise refer to History table B for populating the rest of the columns in the table.

That's fine. Do both lookups and then conditionally decide which result to use.

shank also wrote:Can I treat a dataset(or anyother stage) as I am treating a sybase table? Can I perform all operations (select, min(), max(), update, delete etc) on that stage as I am doing in a sybase table?

Somewhat. You can certain 'select' from it and then min/max would be handled by a downstream aggregator. You cannot update or delete from a dataset however but you could certanly build a new updated/deleted dataset as the result of operations performed on the original dataset.

shank · Post by **shank** » Sun Nov 11, 2012 5:05 pm

Ok got it. So What you are essentially saying is .. we cannot create a temporary table within datastage. Right ?

chulett · Post by **chulett** » Sun Nov 11, 2012 5:56 pm

Well... you could certainly create and maintain 'temporary' tables within a relational database via DataStage. Otherwise, objects like datasets or filesets or even flat files can be leveraged much like tables, temporary or otherwise.

chulett · Post by **chulett** » Mon Nov 12, 2012 12:01 pm

Shank... I'm not feeling really confident that your questions are being answered fully or to your satisfaction. Make sure you come back and let us know if you need any other help or clarification with this.

And I'm still wondering if the two lookups can be combined into the source SQL, looks to me like that should be possible if all of the tables are co-resident. Save the lookups for when you are dealing with disparate data sources and have no other choice.