Transformer output rows reduced!

dsproj2003 · Post by **dsproj2003** » Thu Aug 07, 2003 1:24 pm

Hi,

I have a transfomer stage which has 'x' # of rows in input stream.
There is no constraint on the transformer.

Even then, the output is 'y' # of rows where y < x.

There are no corresponding warnings/messages in the log file.

Any pointers to this?

Thanks in advance.

Regards,
Nitin

kduke · Post by **kduke** » Thu Aug 07, 2003 2:44 pm

Nitin

If you are talking about in DataStage monitor then I have never seen that before. If you are talking about doing counts on the tables before and after a job runs then you could have 2 records with the same key and the update the same record in the target table.

If the first option then what OS and what version of DS and what database?

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

kcbland · Post by **kcbland** » Thu Aug 07, 2003 2:50 pm

Please supply your source and target stage types. Where are you getting your counts? From the Monitor or are you line counting the source and targets?

Kenneth Bland

badhri · Post by **badhri** » Thu Aug 07, 2003 3:07 pm

May be we should check the Update strategy in the target stage. This can happen if that had a Insert/Update or Update/Insert strategy.

Badhri ...

Badhrinath Krishnamoorthy
www.cognizant.com

dsproj2003 · Post by **dsproj2003** » Thu Aug 07, 2003 3:11 pm

Okay further details are as below:

Source Stage: Dataset (or an output stream from a previous lookup stage)

Target stage: DataSet

I am viewing the number of records flowing through each link using either of the following:
-'Show performance statistics' Option
-(job) Monitor

Both essentially give the same data.

I suspect if this issue of dropping of records is related to NULL in some field values?

Please suggest.

Regards,
Nitin

kduke · Post by **kduke** » Thu Aug 07, 2003 3:46 pm

Nitin

If that was true then you would get a warning in the log. Also what type is your source and target like ODBC, OCI or whatever.

Kim.

Kim Duke
DwNav - ETL Navigator
www.Duke-Consulting.com

kcbland · Post by **kcbland** » Thu Aug 07, 2003 3:50 pm

Okay folks, he's on Parallel Extender, and he's talking about a ".ds" dataset file. We're all firing off Server answers and he's got a parallel job.

Kenneth Bland

dsproj2003 · Post by **dsproj2003** » Thu Aug 07, 2003 3:51 pm

Kim,

My source and target stages are simply Data Sets (or a Lookup stage).

I am not using any ODBC, or any database calls for the jobs in question.

As I said I am not getting any warning messages corresponding to the transformer in the log file.

Data Stage is PX 6.0.

[?]

Regards,
Nitin

kcbland · Post by **kcbland** » Thu Aug 07, 2003 3:56 pm

Nitin, you're going to have to be VERY explicit with your posts. There's over 2000 DataStage installations out there of DataStage Server, and only tens of PX, so everyone assumes Server based questions unless PX is stated.

What is your partitioning scheme? Have you specified unique? Did you switch node pools in between?

Kenneth Bland

dsproj2003 · Post by **dsproj2003** » Thu Aug 07, 2003 4:06 pm

I understand Kenneth.

I will try to be more explicit now..

-partitioning scheme: It is set to 'Auto' in all prior stages in that job. Data sets being used are set with Preserve partitioning = 'Default (Propagate)'.

-Job Nodes Pools: 4 nodes i.e. 4x4

No, I am not switiching node pools in between. I am keeping 4x4 nodes for all jobs.

Btw I did not understand your following question:
Have you specified unique?

Regards,
Nitin

ray.wurlod · Post by **ray.wurlod** » Thu Aug 07, 2003 4:34 pm

Just a thought. It might be worth checking whether a DataSet can have duplicate rows or whether, like hashed files in server jobs, a duplicate (key) performs a destructive overwrite (with no warnings).

Ray Wurlod
Education and Consulting Services
ABN 57 092 448 518

kcbland · Post by **kcbland** » Thu Aug 07, 2003 4:44 pm

There is a checkbox somewhere when specifying the partitioning to eliminate duplicate rows (UNIQUE option?). I'm speaking from memory, as I haven't worked with PX in a year. Check your documentation.

Kenneth Bland

dsproj2003 · Post by **dsproj2003** » Thu Aug 07, 2003 6:27 pm

Based on the ongoing discussion, I have certain related questions:

-When we talk of transformer, there is no key like thing isnt it.
So where does the question of two records being same come in here.

Is it something related to the dataset?

I even tried having the sequential file as output from transformer. And even that has same probelm( receving lesser # of records than in input with no constraint)

What is the solution? I mean my requirement is to have all the records in input stream (unique or non unique) sent to the output stream of the transformer.

Could you please clarify the concept if I am missing out on something.

Regards,
Nitin

kcbland · Post by **kcbland** » Thu Aug 07, 2003 7:59 pm

I would refer you to www.datastagexchange.com, where there's a PX specific forum moderated by bigpoppa. You'll probably get the best answers there.

Kenneth Bland

ray.wurlod · Post by **ray.wurlod** » Thu Aug 07, 2003 8:30 pm

It's really difficult to diagnose this without an explicit explanation of the job design. For example, you did not mention whether there is any constraint on the Transformer stage's output link. If there were, it would be expected to limit the number of rows output.
Were I consulting to solve this, I would need to look at the job in detail, either on site or by having had an export of the job plus sample data mailed to me.