partition question

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
dnat
Participant
Posts: 200
Joined: Thu Sep 06, 2007 2:06 am

partition question

Post by dnat »

Hi,

I have a simple job which reads from a file and updates a particular field in a oracle database.

The file has around 20 mil record and the commit count is 10000. The partition type is Auto(havent changed anything).

Now the job aborted in the middle due to oracle instance being down. It had updated around 12 mil records.

Now if we re-run this again all the 20 mil records would be taken for update. But since the partition is Auto, we are not sure whether the updated records are the first 12 mil records from the file which we can remove and run this job again with just remaining 8 mil records.

Can anyone comment on this.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If you are reading the file in sequential mode, then you can skip the first 12 million rows (or whatever the number is) with impunity.

If you are reading the file in parallel mode (perhaps more than one reader per node), then it's still possible, but you need to check how many rows have been processed on each node (from Monitor, perhaps, or DSGetLinkInfo() with the "partition row count" option). You may need to back off a little, say to 10 million just to be safe.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And you could only be reading the file 'in parallel mode' if it is a fixed-width file, so my money is on sequential.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Not true any more. Multiple readers per node will work with delimited formats.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not true any more as of when? With which version? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

7.1 at a guess, maybe even a minor 7.0 release.

Obviously it's more difficult with delimited data, but it's certainly possible (locate the percentage point then scan forward for a line terminator).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

ray.wurlod wrote:Not true any more. Multiple readers per node will work with delimited formats.
I tried it at 8.0 and it won't.
Sequential_File_33: The multinode option requires fixed length records.
Post Reply