Dos sort drop some records?

datastagedw · Post by **datastagedw** » Thu Aug 20, 2009 5:24 am

Hello All,

I have a simple job extracting data from flat file and using sort and transformer and finally putting into DB. This is a reusable component wherin i would be running multiple instances of the job. Starngely i am able to run this job nicely for all the flat files except one. Of course the files have same format and shcema. What i observed is that after sort some records are getting fropped. Allow duplicate is set to False true actually. However the data anyways does not have any duplicates.

Does sort drop any records? I have tried Auto parition for this. Even if i remove sort and do the in built sort in transformer stage i face the same issue. Is that my data has something wrong or is it abt the behaviour of the stage?

I am working on 4 nodes m/c and the DS version is 8.0

Thanks

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Aug 20, 2009 5:28 am

Sort stage ignores duplicates and does not drop it.

What is the sort in the transformer ?

Without full design, metadata and sample data it is not possible to comment.

Try to locate the missing / dropped records and find the reason.

miwinter · Post by **miwinter** » Thu Aug 20, 2009 5:36 am

Just to add further, sort actually can deduplicate if defined to do so. See the Allow Duplicates option which controls this based on sort key. It does, however, default to true, so duplicates are retained unless this has been explicitly changed.

I'm not sure however, how:

Allow duplicate is set to False true actually

as the two are mutually exclusive

Does the monitor confirm that the record droppage occurs in the sort or does it instead show that this is happening in the transform instead? The transform can effectively act as a filter if constraints have been applied.

datastagedw · Post by **datastagedw** » Thu Aug 20, 2009 8:16 pm

Sainath.Srinivasan wrote:Sort stage ignores duplicates and does not drop it.

What is the sort in the transformer ?

Without full design, metadata and sample data it is not possible to comment.

Try to locate the missin ...

Hello,
I think I have confused you. The thing is I am using a sort stage followed by a transformer. I want to sort on column and in the sort stage i have 'allow duplicates = true'. i have 72 records coming from SFS to the sort stage and then from sort only 55 appears to be going to the next stage. this is what i saw through performance statistics. the column i am sorting is varchar 255 with nullability as NO. However the column contains both character and number data and also alpha numericals. I don't think that's the problem because the job works fine for other files with similar kind of data.

My concern is why this 17 records are getting dropped or are they silently passing? After sort and transformer i have the odbc connector stage there also 55 records are getting inserted and the job aborts saying that nulls cannot be inserted into the database table. However the column is nullable NO. How can nulls enter?

Please let me know in case more clarification required. looking forward for your responses.

Thanks

ray.wurlod · Post by **ray.wurlod** » Thu Aug 20, 2009 9:14 pm

Sort stage (not performing a unique sort) does not drop records. How are you handling nulls in the sort keys?

datastagedw · Post by **datastagedw** » Fri Aug 21, 2009 12:13 am

ray.wurlod wrote:Sort stage (not performing a unique sort) does not drop records. How are you handling nulls in the sort keys? ...

Actually the sort key is a nullable no column.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Fri Aug 21, 2009 2:19 am

Did you check your Director log for the job run? It is possible that you can locate them there.

Do you have any empty Varchar?

Include a copy stage before your sort and write the same into another sequential file.

Do the same after aggregator. This will give you an idea of rows passing.

Btw, scope of null check is not identical in PX in all stages. So you may have to test them in pieces before assembling them back.