Dos sort drop some records?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

Dos sort drop some records?

Post by datastagedw »

Hello All,

I have a simple job extracting data from flat file and using sort and transformer and finally putting into DB. This is a reusable component wherin i would be running multiple instances of the job. Starngely i am able to run this job nicely for all the flat files except one. Of course the files have same format and shcema. What i observed is that after sort some records are getting fropped. Allow duplicate is set to False true actually. However the data anyways does not have any duplicates.

Does sort drop any records? I have tried Auto parition for this. Even if i remove sort and do the in built sort in transformer stage i face the same issue. Is that my data has something wrong or is it abt the behaviour of the stage?

I am working on 4 nodes m/c and the DS version is 8.0

Thanks
ETL DEVELOPER
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Sort stage ignores duplicates and does not drop it.

What is the sort in the transformer ?

Without full design, metadata and sample data it is not possible to comment.

Try to locate the missing / dropped records and find the reason.
miwinter
Participant
Posts: 396
Joined: Thu Jun 22, 2006 7:00 am
Location: England, UK

Post by miwinter »

Just to add further, sort actually can deduplicate if defined to do so. See the Allow Duplicates option which controls this based on sort key. It does, however, default to true, so duplicates are retained unless this has been explicitly changed.

I'm not sure however, how:
Allow duplicate is set to False true actually
as the two are mutually exclusive

Does the monitor confirm that the record droppage occurs in the sort or does it instead show that this is happening in the transform instead? The transform can effectively act as a filter if constraints have been applied.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

Post by datastagedw »

Sainath.Srinivasan wrote:Sort stage ignores duplicates and does not drop it.

What is the sort in the transformer ?

Without full design, metadata and sample data it is not possible to comment.

Try to locate the missin ...
Hello,
I think I have confused you. The thing is I am using a sort stage followed by a transformer. I want to sort on column and in the sort stage i have 'allow duplicates = true'. i have 72 records coming from SFS to the sort stage and then from sort only 55 appears to be going to the next stage. this is what i saw through performance statistics. the column i am sorting is varchar 255 with nullability as NO. However the column contains both character and number data and also alpha numericals. I don't think that's the problem because the job works fine for other files with similar kind of data.

My concern is why this 17 records are getting dropped or are they silently passing? After sort and transformer i have the odbc connector stage there also 55 records are getting inserted and the job aborts saying that nulls cannot be inserted into the database table. However the column is nullable NO. How can nulls enter?

Please let me know in case more clarification required. looking forward for your responses.

Thanks
ETL DEVELOPER
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sort stage (not performing a unique sort) does not drop records. How are you handling nulls in the sort keys?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
datastagedw
Participant
Posts: 53
Joined: Fri Mar 07, 2008 1:17 am

Post by datastagedw »

ray.wurlod wrote:Sort stage (not performing a unique sort) does not drop records. How are you handling nulls in the sort keys? ...
Actually the sort key is a nullable no column.
ETL DEVELOPER
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Did you check your Director log for the job run? It is possible that you can locate them there.

Do you have any empty Varchar?

Include a copy stage before your sort and write the same into another sequential file.

Do the same after aggregator. This will give you an idea of rows passing.

Btw, scope of null check is not identical in PX in all stages. So you may have to test them in pieces before assembling them back.
Post Reply