Server to Parallel

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Server to Parallel

Post by sam334 »

All,
Hope you are doing well. Need an advice from you. I am actually working in Datastage for almost 3 years now, currently 9.1. But my current project only uses server jobs not parallel. Basically I started with Server job datastage career. It seems like, all the companies in the market are using parallel jobs. Can you please suggest me how can I move to parallel from server job. I know shell scripting and PL/SQL which can be useful for server job. But to really move to parallel environment which should I do?

Thanks,

Appreciate your thought.

Sam.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

You could start by reading all the available product documentation. :)
Choose a job you love, and you will never have to work a day in your life. - Confucius
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Exactly...and just start building some Parallel Jobs and playing with it. See what's different...and what is the same. In addition to the documentation, review the threads here...do some searching --- there are resources on how to build parallel Jobs and why/what, etc. all over the place...not just in this forum.

...and don't drop your Server skills...they will come in very handy.....there are times when you will want Server Jobs alongside, or instead of, EE Jobs.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Once config files are setup then PX jobs are not much different. It is a slightly different mindset. Sort only when you need it. Set your partitioning and leave it the same. Only a few stages require sorting like Merge. So you need a hash partition and same from the time you sort until Merge stage.

The biggest mindset is to never land the data. One big job is usually better than landing the data and a lot of little jobs. Grab your data. Do everything you need and land it in the target. One big job. Often a sort in the database especially on multiple keys often does not match what DataStage wants. So sort it in your job when you need it. Debugging this can be a nightmare.

More stages in PX is not necessarily a bad thing. PX obs will optimize out copy stages. A trick I learned from IBM developers was to have a copy stage right before a join, merge or lookup stage. Throw in a peak stage so see why your join, merge or lookup is not working like you think.

You need to think about how memory is used. Lookups are all in memory per each node. So keep as few fields as possible. If you have an 8 node config and millions of rows on a lookup then might need to change to a join or a merge. Right your job both ways and test it. Look at memory usage. Remember production probably has a lot more jobs all running at the same time. So one job in DEV may run great and in PROD it kills the server.

Just because something runs without warnings does not mean it is accurate or optimized. You lookups or joins could be failing because of something you did not think about like metadata. If one side is trimmed and mapped to a varchar and the other side is not trimmed or mapped to char then you are not going to get results you think about.

PX is a lot more picky about metadata. It will do implicit conversions when you are sloppy. You either need to convert your column types up front or right before insert/update. Be consistent. It will save you lots of work debugging.

All outputs need reject links. You never know if an insert or update failed without reject links. You want to trap all rejects so you know how many failed.

Remember PX is not that much different than server. If you are good in one you can be good in the other. Try to imagine what is happening when it splits a stream into multiple partitions. Sometimes 4 nodes will out perform 6 nodes if you can do lookups without sorts compared to merge with sorts. Same is true for options on stages. Just because it will run with array size of 30,000 does not mean that it is faster. Maybe 5,000 is faster in production because the database is not so over worked creating rollback segments and monitoring locks.

Most of this is what we in Texas call "common sense". A term we grew up with meaning think it through, it might not be as simple as you thought.
Mamu Kim
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not sure how common 'common sense' is any more, Kim. :wink:

All great advice. From a reading standpoint, I highly recommend this IBM Redbook: InfoSphere DataStage Parallel Framework Standard Practices.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Reading many of the manuals is not extremely helpful. Quite a few are extremely repetitive (boring!) and are self referential ("the Buffer button is used to set Buffers" - Doh!). They are a good reference if you know what you are looking for.

I'd say a better place to start is the Redbook IBM InfoSphere DataStage Data Flow and Job Design.

http://www.redbooks.ibm.com/abstracts/sg247576.html
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yah... that one too. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
rkashyap
Premium Member
Premium Member
Posts: 532
Joined: Fri Dec 02, 2011 12:02 pm
Location: Richmond VA

Post by rkashyap »

chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Wowzers... that brings back memories. 8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

Thanks everybody. I will start reading the materials very soon.

Craig, Thanks for your suggestion to post it in general blog..Worked great..:)

Also, if somebody can open the non visible part for a while that will be great. As said before, my premium membership is not activated yet though I paid almost a month back.

Thanks....
Post Reply