Switch Versus Transformer for splitting data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Switch Versus Transformer for splitting data

Post by clarcombe »

I am having trouble loading a table because I run out of log space. While the DBA gets his a*se in gear I will split up the dataset into smaller chunks and load them that way.

I used @OUTROWNUM to break the data down in the transformer in the constraint i.e.
1st output link @OUTROWNUM <= 50000
2nd output link @OUTROWNUM > 50000 and @OUTROWNUM < 100000 etc but strangely enough it didn't work (am I being to Server-y in my approach). Is it due to node distribution ?

So I thought I would use the switch stage but I can't seem to use the @OUTROWNUM in the Case part

Any ideas at all ?

Thanks
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

How often do you pass a commit? And also are you performing and deletes before doing your inserts? What database is it?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
thebird
Participant
Posts: 254
Joined: Thu Jan 06, 2005 12:11 am
Location: India
Contact:

Post by thebird »

@OUTROWNUM would not work as in the Server edition, as in Parallel jobs it will get executed on each of the nodes. So if you set a constraint - @OUTROWNUM <= 50000 - each of the nodes would send you 50000 records each.

System Variables (like @OUTROWNUM) can only be used in a Transformer stage. These are not supported by any other stages like the Switch stage.
Go through the chapter on Switch in the Parallel Job developer's guide to see its functioning. I dont think Switch is a viable option in this case.

Why not set a Parallel counter in your transformer to count your records and then use a filter stage to split these into different target Data Sets?

Aneesh
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Read postby Vincent. It will help you use @INROWNUM along with @PARTITIONNUM and @NUMPARTITIONS.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

If these are inserts only, why not use a direct write (bulk load)?

Can you partition your data onto more processing nodes, so that smaller transactions come from each node? That way you don't have to worry about all the logic - the engine can perform it for you.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply