2 ODBC Stages Open command not deleting each other's data.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

2 ODBC Stages Open command not deleting each other's data.

Post by highpoint »

Hi,

Looks like we found a bug in code but was not affecting data. Would appreciate gurus advise on the below topic why this was not affecting data:

In my job for target we have 2 ODBC stages writing to the same table.

And each odbc stage uses an "Open Command" which deletes the table before it loads anything in to the table.


process how it works:

1st odbc stage runs i.e executes the open command (delete statement) and then inserts.
2nd odbc stage runs i.e executes the open command (delete statement) and then inserts.

My understanding was which ever odbc stage executes later will delete the data inserted by the earlier odbc stage. But looks like this is not happening.

Appreciate your help in understanding this.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Open command executes when the stage opens. Unless there's some designed-in reason not to, every stage opens when the job opens.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Post by highpoint »

ray.wurlod wrote:Open command executes when the stage opens. Unless there's some designed-in reason not to, every stage opens when the job opens.
Thanks Ray Wurlod!!

I was under assumption that the ODBCwhich recieves records first will get open first. Thanks for the clarification.

So, in my case if i have "same delete statement" as open commnad in one ODBC or more ODBC does not make any difference as they are executed at the same time right? Will having delete in one ODBC serves the same purpose of having in multiple ODBC or is ther any difference?

And how does the close command work then? Is it all stages get close at the same time?

Appreciate your reply.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Again, that depends on the job design. Stages such as Sort and Aggregator (Hash mode) can block execution of downstream stages. In an unblocked design stages close as close to simultaneously as whatever is controlling the processes can manage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Post by highpoint »

ray.wurlod wrote:Again, that depends on the job design. Stages such as Sort and Aggregator (Hash mode) can block execution of downstream stages. In an unblocked design stages close as close to simultaneously as whatever is controlling the processes can manage.
Thanks Ray!! Still i am not very clear.

Say if we dont have any sort and aggregator stage in job, then ideally both the ODBC open command "Delete Statements" shall run simultaneously. How will 2 Deletes statements run on the same table at the same time. Each statement will have to get exclusive lock on the table block before it can process delete. How will this be handled in simultaneous deletes.

And if one run's after the other say after few millisec, then we stand by the risk of losing the data inserted by the 1st ODBC stage before the 2nd ODBC stage delete happens.


Then what will be the best option to handle this.
We want to delete data before any ODBC stage in the job writes to table.
And once any ODBC stage writes data to th table we dont want to lose it.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Too may variables with your current approach. Why not do the delete / truncate as a separate operation before this job runs?
-craig

"You can never have too many knives" -- Logan Nine Fingers
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Post by highpoint »

chulett wrote:Too may variables with your current approach. Why not do the delete / truncate as a separate operation before this job runs?
Shall we do it in before job sub routine.
If so do we have to write a seperate shell for this?

Or is there any other way to achieve this.?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Nothing's different in DataStage - it's just another client as far as the database is concerned. If you issue two DELETE statements on the same table, the first to start executing will take an exclusive table lock and the second will wait until that lock is released (or a timeout error occurs).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Post by highpoint »

ray.wurlod wrote:Nothing's different in DataStage - it's just another client as far as the database is concerned. If you issue two DELETE statements on the same table, the first to start executing will take an exclusive table lock and the second will wait until that lock is released (or a timeout error occurs).
So, In this case say 1st ODBC takes the exclusive table lock, deletes the records, issues a commit and then starts loading data.

Then the 2nd ODBC stage opens say suppose later may be because this has aggregator and sort over its link. Then the 2nd ODBC will delete the data loaded by the 1st ODBC.

Is this scenario possible? If Yes
which way would you suggest to avoid any risk of losing data.
Continue having the open command in each ODBC or use before job sub routine or is there any other way to reduce the risk?

Appreciate your reply.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yet again, it depends. If process 1 is busily updating, then it will be taking at least record level locks that will prevent process 2 from obtaining a table level lock. So how are the deletes being accomplished? Through SQL (which will start with record level locks but will probably end up promoting them) or truncate command (which is not transactional and will take a table level lock as it starts)?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Post by highpoint »

Delete are being accomplished using the sql delete command some thing like this:

Delete from table where col=abc;

Appreciate reply.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sentence #2 of my previous reply.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply