Page 1 of 1

Switch Versus Transformer for splitting data

Posted: Fri Dec 08, 2006 8:56 am
by clarcombe
I am having trouble loading a table because I run out of log space. While the DBA gets his a*se in gear I will split up the dataset into smaller chunks and load them that way.

I used @OUTROWNUM to break the data down in the transformer in the constraint i.e.
1st output link @OUTROWNUM <= 50000
2nd output link @OUTROWNUM > 50000 and @OUTROWNUM < 100000 etc but strangely enough it didn't work (am I being to Server-y in my approach). Is it due to node distribution ?

So I thought I would use the switch stage but I can't seem to use the @OUTROWNUM in the Case part

Any ideas at all ?

Thanks

Posted: Fri Dec 08, 2006 8:59 am
by DSguru2B
How often do you pass a commit? And also are you performing and deletes before doing your inserts? What database is it?

Posted: Fri Dec 08, 2006 9:41 am
by thebird
@OUTROWNUM would not work as in the Server edition, as in Parallel jobs it will get executed on each of the nodes. So if you set a constraint - @OUTROWNUM <= 50000 - each of the nodes would send you 50000 records each.

System Variables (like @OUTROWNUM) can only be used in a Transformer stage. These are not supported by any other stages like the Switch stage.
Go through the chapter on Switch in the Parallel Job developer's guide to see its functioning. I dont think Switch is a viable option in this case.

Why not set a Parallel counter in your transformer to count your records and then use a filter stage to split these into different target Data Sets?

Aneesh

Posted: Fri Dec 08, 2006 9:54 am
by DSguru2B
Read postby Vincent. It will help you use @INROWNUM along with @PARTITIONNUM and @NUMPARTITIONS.

Posted: Fri Dec 08, 2006 4:17 pm
by ray.wurlod
If these are inserts only, why not use a direct write (bulk load)?

Can you partition your data onto more processing nodes, so that smaller transactions come from each node? That way you don't have to worry about all the logic - the engine can perform it for you.