Key points for PX development

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Key points for PX development

Post by clarcombe »

Bit of a woolly topic I know but I have just landed my first PX contract and start tomorrow :? ; a 3 month migration of MVS flat file data to MILOS or something like that ( a SAP load tool) also flat file output

I did the PX training a couple of years ago and have reached a relatively proficient level of Server.I have been rereading the PX server guide although only at page 79 of 1130 :cry:. Have also just looked at the modify training tip (could do with a few more like that)

If there were a list of 10 PX commandments what would they be ?

Thanks
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

(This could be an interesting thread if everyone would post 1 additional commandment; we'll certainly get more than 10 but it will be interesting!)

#1 Don't think or design in terms of the server jobs you know
The designer canvas and identical looking designer objects lull you into the false sense of security that you really aren't doing anything very different from server jobs when you design parallel jobs. Without exception the inferior PX jobs that I've seen to date are badly written because of this basic error, by developers who came from server and were not given sufficient training or experimentation time to realize the differences.
Always remember that you are working on a different product that requires a different mindset in designer and later on when executing jobs.
Since the number of concurrent parallel streams that your job will run in depends upon the configuration file and, to some extent, on the parallelism of your databases, you need to understand that a certain amount of implicit repartitioning and interaction is going to happen to your data.
Always design with those parallel streams in mind, even if your job will most likely end up running on a 1-node configuration. I am sure that each and every one of us who have worked in PX have, at one time, done a lookup and neglected to think about the partitioning and needed to revisit and fix that stage sometime during the testing phase when the results weren't quite right.
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

Having made the transition from server to parallel recently.
Here are 3 of many points I noticed.
1.Thou shalt not perform any functions without properly handling NULLs(this one be more useful for you as you are dealing with flat files both in source and target).
2.Thou shalt perform explicit type conversion whenever possible and not trust Datastage to do type conversion.(Px is more strict in the data types,strognly typed environment if I may say so).
3. Remember thou follow the dsxchange and use wisely.(thats what I have been doing :) ).

Amen,
HTH
Kris

Where's the "Any" key?-Homer Simpson
pigpen
Participant
Posts: 38
Joined: Thu Jul 13, 2006 2:51 am

Post by pigpen »

Think carefully about the selection of partition keys when creating dataset. It may hinder the job performance using the dataset afterwards.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

1. Thou shalt remember that PX comes with Server as well, so decide when to do it as pure PX and not.
2. Thou shalt spend remember not to hog the server nodes until running full stress tests.
3. Thou shalt use a wimpy node pool during minor test runs so that Director doesn't crawl and (-14) errors don't occur.
4. Thou shalt remember that it is not a sign of skill to reach the limit of stages allowed in a single job design.
5. Thou shalt remember that even though you're doing a one-time data conversion project your standards and practices should still be good proper programming, no variables named x,y, or z.

and I've ran out of pointers... :cry:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

Many thanks for your comments. Luckily I discovered that the project is at an advanced stage so there are norms and rules in place, some of which have been mentioned here
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

4th point given by Kenneth should be noted.
Adding up of consicutive active stages wouldnt affect the performace much in Server job, the same wont work in PX. People might happily add more stages which will directly affect the performace, even the combinalble are enabled.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
rwierdsm
Premium Member
Premium Member
Posts: 209
Joined: Fri Jan 09, 2004 1:14 pm
Location: Toronto, Canada
Contact:

Post by rwierdsm »

I would add that you need to remember in your estimates that PX jobs WILL take longer to code.

The server transformer stage was like a Swiss Army knife, now you have a whole drawer full of utensils, none of which do everything the Swiss Army knife did, but will do each individual task better.

Rob
Rob Wierdsma
Toronto, Canada
bartonbishop.com
Post Reply