Server v/s EE

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DS_MJ
Participant
Posts: 157
Joined: Wed Feb 02, 2005 10:00 am

Server v/s EE

Post by DS_MJ »

I am migrating Server jobs to EE jobs and I am noticing that in EE jobs one has to take care of null values or else we get warnings. Example

COL NAME = COL_DT
TYPE = CHAR
LENGTH = 10
NULLABLE = Yes

Code: Select all

In the server job this is what is done in Transformer
COL_DT[1,10]

EE Transformer
If IsNull(COL_DT) then SetNull() Else COL_DT[1,10]
This is the only difference between the Server and Parallel job. But the counts of rows loaded by both the jobs differ by 3. Parallel has 3 more rows loaded then the server job.

I have seen that the parameters for date passed both in server and Parallel are the same rest all is the same. There are no constraints or variables or any other things used.

Would appreciate if somebody could explain the difference, or what I am missing.
Thanks in advance,
Minhajuddin
Participant
Posts: 467
Joined: Tue Mar 20, 2007 6:36 am
Location: Chennai
Contact:

Post by Minhajuddin »

A few differences:

In parallel jobs you have a lot of stages to work with....

And everything which happens under the hood is different from that of server jobs.

Parallel jobs get executed on the Parallel Engine.

The differences are too many............... :wink:
Minhajuddin

<a href="http://feeds.feedburner.com/~r/MyExperi ... ~6/2"><img src="http://feeds.feedburner.com/MyExperienc ... lrow.3.gif" alt="My experiences with this DLROW" border="0"></a>
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Just a guess, (because I know nothing about your job design which if provided would help understand where your problems are occurring) is that your partitioning is off. If you are performing any lookups or joins and your partitioning is off it can cause cartesian products to occur in your data. Provide a job design for further insight.
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

You can do a compare between the results from both server and parallel and obtain those three records and check what is going with them. My guess will be that those records are getting dropped in the server job due to Nulls or non-printable characters in the COL_DT fields. But your job design information can provide a insight into what you are doing.
I haven't failed, I've found 10,000 ways that don't work.
Thomas Alva Edison(1847-1931)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Parallel jobs also must be precise about the format of date, times and timestamps. The default format is set up in the Administrator client (project properties). If you deviate from these (for example in functions or in job parameter values) then you must handle the deviation by specifying what the different format is.

And, as Keith notes, you must be very wary that the partitioning you choose (even if using (Auto), the default) is amenable with what you are trying to achieve - for sorting, grouping, de-duplication and so on every key value must be able to find all its matching values on the same partition.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply