Server v/s EE

DS_MJ · Post by **DS_MJ** » Thu Sep 27, 2007 8:28 am

I am migrating Server jobs to EE jobs and I am noticing that in EE jobs one has to take care of null values or else we get warnings. Example

COL NAME = COL_DT
TYPE = CHAR
LENGTH = 10
NULLABLE = Yes

Code: Select all

In the server job this is what is done in Transformer
COL_DT[1,10]

EE Transformer
If IsNull(COL_DT) then SetNull() Else COL_DT[1,10]

This is the only difference between the Server and Parallel job. But the counts of rows loaded by both the jobs differ by 3. Parallel has 3 more rows loaded then the server job.

I have seen that the parameters for date passed both in server and Parallel are the same rest all is the same. There are no constraints or variables or any other things used.

Would appreciate if somebody could explain the difference, or what I am missing.

Minhajuddin · Post by **Minhajuddin** » Thu Sep 27, 2007 11:53 am

A few differences:

In parallel jobs you have a lot of stages to work with....

And everything which happens under the hood is different from that of server jobs.

Parallel jobs get executed on the Parallel Engine.

The differences are too many...............

kwwilliams · Post by **kwwilliams** » Thu Sep 27, 2007 1:01 pm

Just a guess, (because I know nothing about your job design which if provided would help understand where your problems are occurring) is that your partitioning is off. If you are performing any lookups or joins and your partitioning is off it can cause cartesian products to occur in your data. Provide a job design for further insight.

us1aslam1us · Post by **us1aslam1us** » Thu Sep 27, 2007 1:54 pm

You can do a compare between the results from both server and parallel and obtain those three records and check what is going with them. My guess will be that those records are getting dropped in the server job due to Nulls or non-printable characters in the COL_DT fields. But your job design information can provide a insight into what you are doing.

ray.wurlod · Post by **ray.wurlod** » Thu Sep 27, 2007 5:04 pm

Parallel jobs also must be precise about the format of date, times and timestamps. The default format is set up in the Administrator client (project properties). If you deviate from these (for example in functions or in job parameter values) then you must handle the deviation by specifying what the different format is.

And, as Keith notes, you must be very wary that the partitioning you choose (even if using (Auto), the default) is amenable with what you are trying to achieve - for sorting, grouping, de-duplication and so on every key value must be able to find all its matching values on the same partition.