Inserting to Multiple Oracle stages

nelc · Post by **nelc** » Thu Jul 10, 2003 4:37 am

Hi all,

I faced the problem of jobs behaving differently in different database instances. Somehow the job that worked in DB1 previously(by that i mean that the data are correctly and successfully loaded), couldn't work in other instances like DB2. The job will hang and no response or error messages from the Director console. Once the job hits such a problem, it 'becomes' unable to run in DB1 also. After that, it just locks all the sequential files that the job writes out, and I cannot delete the files or overwrite them.

Initially I thought it was because Oracle DB was not responding, but when I do a kill of the session through oracle management console, the Director then responses with "session killed" error.

I am now wondering is it due to writing to too many targets in one job. The job writes to 4 oracle targets, I have ensured that the link order caters for the parent-child constraint.

Anyone met with such problem before? Any help is really appreciated. Really dunno how to proceed now.

Thanks in advance! [:)]

chulett · Post by **chulett** » Thu Jul 10, 2003 6:21 am

Are you writing to multiple OCI stages (as seems to be implied by your header) or multiple links to a single OCI? This can make a *big* difference, especially when dealing with multiple tables and RI constraints.

I'd suggest trying a single OCI stage, unless you are working across multiple databases. That way all work is done as a discrete transaction thru a single logon, you'll have an easier time with the p/c relationship handling and you can enable Transaction Grouping if desired. From my experience, relying on the link order in this situation will *only* work if you set your commit size to 1 so that every row is commited when it is written.

-craig

nelc · Post by **nelc** » Thu Jul 10, 2003 7:24 pm

Hi chulett,

Thanks for the advice.

I am currently using one OCI stage for each table I need to insert to. Meaning total of 4 OCI stages for 4 tables.

How can I use a single OCI stage to insert to multiple tables?

Would appreciate further advice... [:)]

chulett · Post by **chulett** » Thu Jul 10, 2003 9:49 pm

As long as the targets are all accessable thru the same logon, simply connect your four links to the same OCI stage. You'll still only put in your logon information once, but the Input tab will now have four link names in the dropdown. Configure all of them here in the same manner as you did in each individual stage. You'll also see a new option for 'Transaction Grouping' show up when using multiple links, read the help on that to see if you can make use of it.

In your case, simply click-and-hold on the end of existing link #2 and drag it to the #1 OCI stage. Repeat for links #3 and #4 and then delete the three orphaned OCI stages. Double-check all of your metadata as some may have been lost when the link was moved. After that, you should be good to go.

-craig

kcbland · Post by **kcbland** » Thu Jul 10, 2003 10:41 pm

Not to pick a nit, but you do realize that unless you specifically need transaction grouping, you have limited your ability to instantiate your job and achieve the greatest performance capability of DataStage Server Edition?

By putting four tables as a target, and specifying link ordering, you have severely limited the execution capabilities of DataStage. If you have four separate jobs, and execute them in the order of referential integrity constraints you achieve the same results, except have vastly different options for tuning. Each job in itself can be instantiated to take advantage of as many cpus and resources available to achieve the fastest loading times.

Good luck!

Kenneth Bland

ray.wurlod · Post by **ray.wurlod** » Thu Jul 10, 2003 11:24 pm

In DataStage 6.0 server jobs you can also accomplish this by splitting the data stream with a Link Partitioner stage. Active stages (for example Transformer stages) on the separate links will execute as separate processes. It is also possible (recommended) to take advantage of link row buffering to speed throughput.

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

chulett · Post by **chulett** » Fri Jul 11, 2003 6:08 am

Well, in this case Transaction Grouping was just a 'bonus' (so to speak) and pointed out as a new feature the OP may not have been aware of.

And the last points are well taken but I'm not sure speed is an issue here. I could see where Ken's approach could be better for *large* data volumes, but depending on the source, having four seperate jobs could be inefficient. Plus saying it 'severely limits' the execution capabilities of DS seems a little... harsh to me. I also don't see how this has "limited your ability to instantiate your job"? [:I] Perhaps you could expand a little on the nit you are picking, Ken? I'd appreciate it.

-craig

kcbland · Post by **kcbland** » Fri Jul 11, 2003 7:35 am

To address Craig's concern, think of the sitation the original poster is dealing with. They must have some sort of denormalized source data that they are having to normalize into parent child relationship so that they can load a target in the appropriate order.

In this case, parents must be created and either inserted or updated prior to the children. In this design approach, the parent may repeat multiple times. Therefore, the SQL must be insert else update. The children most likely are unique, and therefore either insert or update, but only once. Many children have the same parent, so probably the parent has to either initially insert and then update many times, else update every time. How would you go about instantiating this type of design processing? You cannot employ simple round-robin partitioning like ( like MOD(@INROWNUM, InstanceCount) = InstanceNumber - 1 ) to carve up the source data because you must keep all children together. So, now you're talking about using some hashing algorithm. You would also have N number of jobs executing, each with it's own set of database connections. The more database connections, the more limited you are to the throughput capabilities of the database.

I would never design a process like this. I would take the approach of first picking out the parents from the denormalized dataset and process them. You could resolve repeating parents quite easily (I'd even use "sort -u" if you absolutely know the parents repeat char for char across the children. I'd then process the parents into a staging hash file. This job is Instantiate-able to the Degree of Parallelism that your ETL server can handle. This also gives you the opportunity to retrive or assign surrogate keys to the parents. Then, I'd process the children and retrieve the surrogate assigned for the parent. Again, this job is Instantiate-able to the Degree of Parallelism that your ETL server can handle.

Now, the parents and children can be loaded, parents going first. The loading job can be instantiated to the DOP that the database can handle. Dare I suggest using the knowledge of whether you assigned a surrogate key or retieved one indicates that you could have prepare true insert only and update only load files. Each of these can be loaded using the highest performing loading/Instantiating means available. I'd then load the children. Again choosing the appropriate DOP to instantiate.

Kenneth Bland

chulett · Post by **chulett** » Fri Jul 11, 2003 9:03 am

Ok, thanks - I see where you are coming from now. I guess I just didn't read as much into the original posters situation as you did, approaching it from a simpler standpoint. [:)]

-craig