Load more tables -> loading order and one transaction

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

stivazzi
Participant
Posts: 52
Joined: Tue May 02, 2006 3:53 am

Load more tables -> loading order and one transaction

Post by stivazzi »

I'm designing a datamart that takes data from various sources and load into a couple of tables which structure is connected with some constrainst. That means that:
1) I need to load tables in a certain order (master first, detail after)
2) I need to load tables in a unique transaction to avoid inconsistent data in case of exception.

In the past I did another datamart so designed:

ORA_Stage -> Tfm -> |
ORA_Stage -> Tfm -> | --> linkCollector -> Tfm -> ORA_Stage_Target
ORA_Stage -> Tfm -> |


The link collector has a structure that has in a flat way all the possible fields coming from source tables. The collector order data based on a certain field, then the transaction start on the target table.

Now my questions are:
a) is there any other possibility to do this avoiding use the linkCollector (this time I've MANY fields.. and the common table structure would be unmanageable)
b) does DataStage let to load data in a certain order (link order?) coming from different Transformer?

Thank you very much in advance!
Andrea

P.S. Any advantage using a Parallel job?
naveen19
Participant
Posts: 52
Joined: Tue Mar 06, 2007 9:08 am
Location: India
Contact:

Re: Load more tables -> loading order and one transaction

Post by naveen19 »

stivazzi wrote:I'm designing a datamart that takes data from various sources and load into a couple of tables which structure is connected with some constrainst. That means that:
1) I need to load tables in a certain order (master first, detail after)
2) I need to load tables in a unique transaction to avoid inconsistent data in case of exception.

In the past I did another datamart so designed:

ORA_Stage -> Tfm -> |
ORA_Stage -> Tfm -> | --> linkCollector -> Tfm -> ORA_Stage_Target
ORA_Stage -> Tfm -> |


The link collector has a structure that has in a flat way all the possible fields coming from source tables. The collector order data based on a certain field, then the transaction start on the target table.

Now my questions are:
a) is there any other possibility to do this avoiding use the linkCollector (this time I've MANY fields.. and the common table structure would be unmanageable)
b) does DataStage let to load data in a certain order (link order?) coming from different Transformer?

Thank you very much in advance!
Andrea

P.S. Any advantage using a Parallel job?

Hi

You have use the Link Collector when its server

If it is Px you replace ( Link Collector to Funnel stage)

Regards
Naveen.K
vercama
Participant
Posts: 11
Joined: Thu Aug 07, 2008 3:21 am

Re: Load more tables -> loading order and one transaction

Post by vercama »

naveen19 wrote:
stivazzi wrote:I'm designing a datamart that takes data from various sources and load into a couple of tables which structure is connected with some constrainst. That means that:
1) I need to load tables in a certain order (master first, detail after)
2) I need to load tables in a unique transaction to avoid inconsistent data in case of exception.

In the past I did another datamart so designed:

ORA_Stage -> Tfm -> |
ORA_Stage -> Tfm -> | --> linkCollector -> Tfm -> ORA_Stage_Target
ORA_Stage -> Tfm -> |


The link collector has a structure that has in a flat way all the possible fields coming from source tables. The collector order data based on a certain field, then the transaction start on the target table.

Now my questions are:
a) is there any other possibility to do this avoiding use the linkCollector (this time I've MANY fields.. and the common table structure would be unmanageable)
b) does DataStage let to load data in a certain order (link order?) coming from different Transformer?

Thank you very much in advance!
Andrea

P.S. Any advantage using a Parallel job?

Hi

You have use the Link Collector when its server

If it is Px you replace ( Link Collector to Funnel stage)

Regards
Naveen.K

Is there any other possibilities to avoid linkCollector and to set the order?
naveen19
Participant
Posts: 52
Joined: Tue Mar 06, 2007 9:08 am
Location: India
Contact:

Re: Load more tables -> loading order and one transaction

Post by naveen19 »

vercama wrote:
naveen19 wrote:
stivazzi wrote:I'm designing a datamart that takes data from various sources and load into a couple of tables which structure is connected with some constrainst. That means that:
1) I need to load tables in a certain order (master first, detail after)
2) I need to load tables in a unique transaction to avoid inconsistent data in case of exception.

In the past I did another datamart so designed:

ORA_Stage -> Tfm -> |
ORA_Stage -> Tfm -> | --> linkCollector -> Tfm -> ORA_Stage_Target
ORA_Stage -> Tfm -> |


The link collector has a structure that has in a flat way all the possible fields coming from source tables. The collector order data based on a certain field, then the transaction start on the target table.

Now my questions are:
a) is there any other possibility to do this avoiding use the linkCollector (this time I've MANY fields.. and the common table structure would be unmanageable)
b) does DataStage let to load data in a certain order (link order?) coming from different Transformer?

Thank you very much in advance!
Andrea

P.S. Any advantage using a Parallel job?

Hi

You have use the Link Collector when its server

If it is Px you replace ( Link Collector to Funnel stage)

Regards
Naveen.K

Is there any other possibilities to avoid linkCollector and to set the order?



No Option's in the server...
naveen19
Participant
Posts: 52
Joined: Tue Mar 06, 2007 9:08 am
Location: India
Contact:

Re: Load more tables -> loading order and one transaction

Post by naveen19 »

vercama wrote:
naveen19 wrote:
stivazzi wrote:I'm designing a datamart that takes data from various sources and load into a couple of tables which structure is connected with some constrainst. That means that:
1) I need to load tables in a certain order (master first, detail after)
2) I need to load tables in a unique transaction to avoid inconsistent data in case of exception.

In the past I did another datamart so designed:

ORA_Stage -> Tfm -> |
ORA_Stage -> Tfm -> | --> linkCollector -> Tfm -> ORA_Stage_Target
ORA_Stage -> Tfm -> |


The link collector has a structure that has in a flat way all the possible fields coming from source tables. The collector order data based on a certain field, then the transaction start on the target table.

Now my questions are:
a) is there any other possibility to do this avoiding use the linkCollector (this time I've MANY fields.. and the common table structure would be unmanageable)
b) does DataStage let to load data in a certain order (link order?) coming from different Transformer?

Thank you very much in advance!
Andrea

P.S. Any advantage using a Parallel job?

Hi

You have use the Link Collector when its server

If it is Px you replace ( Link Collector to Funnel stage)

Regards
Naveen.K

Is there any other possibilities to avoid linkCollector and to set the order?



No Option's in the server...
vercama
Participant
Posts: 11
Joined: Thu Aug 07, 2008 3:21 am

Re: Load more tables -> loading order and one transaction

Post by vercama »

stivazzi wrote:I'm designing a datamart that takes data from various sources and load into a couple of tables which structure is connected with some constrainst. That means that:
1) I need to load tables in a certain order (master first, detail after)
2) I need to load tables in a unique transaction to avoid inconsistent data in case of exception.

In the past I did another datamart so designed:

ORA_Stage -> Tfm -> |
ORA_Stage -> Tfm -> | --> linkCollector -> Tfm -> ORA_Stage_Target
ORA_Stage -> Tfm -> |


The link collector has a structure that has in a flat way all the possible fields coming from source tables. The collector order data based on a certain field, then the transaction start on the target table.

Now my questions are:
a) is there any other possibility to do this avoiding use the linkCollector (this time I've MANY fields.. and the common table structure would be unmanageable)
b) does DataStage let to load data in a certain order (link order?) coming from different Transformer?

Thank you very much in advance!
Andrea

P.S. Any advantage using a Parallel job?
What about link order? Is it possible to consider the order of the links, the same that DS uses to insert data into target table?

Thanks,
Verena
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ugh. I use separate jobs for each table and run them in the proper order. Transaction Size is always 0 so there's no "inconsistent data", it is all or nothing and the jobs ensure no "exceptions" occur that we have control over.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vercama
Participant
Posts: 11
Joined: Thu Aug 07, 2008 3:21 am

Post by vercama »

chulett wrote:Ugh. I use separate jobs for each table and run them in the proper order. Transaction Size is always 0 so there's no "inconsistent data", it is all or nothing and the jobs ensure no "exceptions" occur ...
By doing what you're saying, seems to me you have the risk to load (and commit) correct data in the first table (Transaction Size =0) and wrong data in the second table (Transaction Size = 0 again), because of, i.e., a violated constraint. Btw, I guess it's not what is searched, since there's one transaction for each table access. Isn't it? If so, your solution, chulett, is not valid.

Cheers,
Verena
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

My solution is perfectly valid, I have thousands of production jobs that do this very thing. The jobs are properly designed so they do not violate constraints of any kind.

I have no clue what you mean by "one transaction for each table access". :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
vercama
Participant
Posts: 11
Joined: Thu Aug 07, 2008 3:21 am

Post by vercama »

chulett wrote:My solution is perfectly valid, I have thousands of production jobs that do this very thing. The jobs are properly designed so they do not violate constraints of any kind.

I ha ...
My concern wasn't about your jobs, but about the outcome of your post.
It could be possible that an insert statement fails, but in this case, reading your implementation chulett, the rollback on the first table (already populated by its own job) cannot be done.
What am I missing?
Verena
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

All target tables' rows have some indication of what run it was that inserted/updated them. By this means even committed changes can be undone if that is deemed necessary. The indication can be a unique "run batch" identifier and/or a timestamp.

Committed deletions are trickier - you need to build a more robust recovery mechanism to recover these. But it's not impossible.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Of course they can be "undone" but I've never run into a situation where I've had to do anything of the sort in say, parent tables because of problems in the child tables. At least not in Production. :wink:

As noted, constraints are fully qualified / checked / handled in the job designs so that there is no way for them to occur. So the occasional problem we are left with are database specific - lack of tablespace issues, for example, being the most common. We have them corrected and rerun the jobs, which is trivial (and automated for the Production Support people) because of the "all or nothing" design.

Now, with regards to your original design - have you looked into Transaction Grouping? That's a way to get records in multiple tables commited / rolledback as a group, but that makes it record by record and commits (or rolls back) each individual group of related records. It tends to be slow but does mean any problem with a child record automatically rolls back the parent record.

I've not had much truck with the Link Collector and prefer not to use it. Instead, could you not join the related tables in a single Oracle source stage such that a single query brings on the related fields from all tables in one swell foop? I also prefer to land intermediate data, "stage" it to flat files and as a general rule do not go directly from database to database, but instead split jobs into discrete "extract" and "load" designs, but that's a whole 'nuther discussion.

On the subject of Link Order - you have full control over that inside the Transformer Editor. So it is up to you to ensure records are written in the proper RI order across related tables. You may need to tighten down the Array Size to make it work properly but it can be done. This is another reason I prefer to break the jobs up into a more modular fashion as then, since I'm working a single table at a time, I can pump up the Array Size or even bulk load changes if so desired - each table's load strategy can be tuned individually. And it tens to run faster (and is easier to support than giant Swiss Army Knife mongo jobs) in spite of the fact that "more jobs" are running.

Food for thought.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vercama
Participant
Posts: 11
Joined: Thu Aug 07, 2008 3:21 am

Post by vercama »

chulett wrote:Of course they can be "undone" but I've never run into a situation where I've had to do anything of the sort in say, parent tables because of problems in the child tables. At least not in Production. :wink:

As noted, constraints are fully qualified / checked / handled in the job designs so that there is no way for them to occur. So the occasional problem we are left with are database specific - lack of tablespace issues, for example, being the most common. We have them corrected and rerun the jobs, which is trivial (and automated for the Production Support people) because of the "all or nothing" design.

Now, with regards to your original design - have you looked into Transaction Grouping? That's a way to get records in multiple tables commited / rolledback as a group, but that makes it record by record and commits (or rolls back) each individual group of related records. It tends to be slow but does mean any problem with a child record automatically rolls back the parent record.

I've not had much truck with the Link Collector and prefer not to use it. Instead, could you not join the related tables in a single Oracle source stage such that a single query brings on the related fields from all tables in one swell foop? I also prefer to land intermediate data, "stage" it to flat files and as a general rule do not go directly from database to database, but instead split jobs into discrete "extract" and "load" designs, but that's a whole 'nuther discussion.

On the subject of Link Order - you have full control over that inside the Transformer Editor. So it is up to you to ensure records are written in the proper RI order across related tables. You may need to tighten down the Array Size to make it work properly but it can be done. This is another reason I prefer to break the jobs up into a more modular fashion as then, since I'm working a single table at a time, I can pump up the Array Size or even bulk load changes if so desired - each table's load strategy can be tuned individually. And it tens to run faster (and is easier to support than giant Swiss Army Knife mongo jobs) in spite of the fact that "more jobs" are running.

Food for thought.
Hi,
reading all these posts seems to me that the only way for a "all or nothing" strategy is to have a common Oracle Stage to load data with Rows per Transaction = 0.
Actually it's not clear how, having more (server) jbos with one OraOci per job can be mainteined the "all or nothing" since, seems to me, at the end of each (sub-)job the transaction is committed if no fails has occurred.
So.. seems to me that a "Swiss Army Knife" solution is the only applicable. Bear in mind that could be necessary (as DataMart solution) to perform first a deletion, then insert and update AND "all or nothing". This one is difficult to mantain since there's a common structure that include all the possible columns involved...
Why DataStage cannot have a sort of "Transaction Stage" that can be used to open and close the transaction and pass it to other subjobs?! :(

Thanks,
Verena
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Your "the only way for a "all or nothing" strategy is to have a common Oracle Stage to load data with Rows per Transaction = 0" would not work for me - I am using DB2.

Read again what Craig and I have said. It's robust and reliable.

Then get yourself some heavy duty protection because your DBA is going to come down on you like the proverbial ton of bricks for demanding such huge transactions.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:lol: Mine understand. Need more redo/undo/transaction space? Just ask.

Common stage that include all the possible columns involved? No. And just plain old stages with whatever columns you need, nothing special.
vercama wrote:Actually it's not clear how, having more (server) jbos with one OraOci per job can be mainteined the "all or nothing" since, seems to me, at the end of each (sub-)job the transaction is committed if no fails has occurred.
Because we ensure that all jobs always run successfully. And because it is modular, a failure only affects a particular portion of the load and backs itself out cleanly. The problem gets resolved and the job stream starts back up from the failure point.
vercama wrote:Why DataStage cannot have a sort of "Transaction Stage" that can be used to open and close the transaction and pass it to other subjobs?!
Serious question - how long have you been working with relation databases? How many different ones? There's no database on the planet that supports or persists transactions across discrete processes that I've ever seen, so what you are asking for cannot be done. Well, unless Ray comes back and tells us Redbrick does. Wouldn't surprise me. :wink:

I've said about all I care to say on this subject, with repeats along the way. Ultimately you have to choose your own path, of course, go whatever way you feel is best... and then live with supporting it. And your descendants as well. An important point to consider.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply