Hello All,
I have a simple job extracting data from flat file and using sort and transformer and finally putting into DB. This is a reusable component wherin i would be running multiple instances of the job. Starngely i am able to run this job nicely for all the flat files except one. Of course the files have same format and shcema. What i observed is that after sort some records are getting fropped. Allow duplicate is set to False true actually. However the data anyways does not have any duplicates.
Does sort drop any records? I have tried Auto parition for this. Even if i remove sort and do the in built sort in transformer stage i face the same issue. Is that my data has something wrong or is it abt the behaviour of the stage?
I am working on 4 nodes m/c and the DS version is 8.0
Thanks
Dos sort drop some records?
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 53
- Joined: Fri Mar 07, 2008 1:17 am
Dos sort drop some records?
ETL DEVELOPER
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Just to add further, sort actually can deduplicate if defined to do so. See the Allow Duplicates option which controls this based on sort key. It does, however, default to true, so duplicates are retained unless this has been explicitly changed.
I'm not sure however, how:
Does the monitor confirm that the record droppage occurs in the sort or does it instead show that this is happening in the transform instead? The transform can effectively act as a filter if constraints have been applied.
I'm not sure however, how:
as the two are mutually exclusiveAllow duplicate is set to False true actually
Does the monitor confirm that the record droppage occurs in the sort or does it instead show that this is happening in the transform instead? The transform can effectively act as a filter if constraints have been applied.
Mark Winter
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
<i>Nothing appeases a troubled mind more than <b>good</b> music</i>
-
- Participant
- Posts: 53
- Joined: Fri Mar 07, 2008 1:17 am
Hello,Sainath.Srinivasan wrote:Sort stage ignores duplicates and does not drop it.
What is the sort in the transformer ?
Without full design, metadata and sample data it is not possible to comment.
Try to locate the missin ...
I think I have confused you. The thing is I am using a sort stage followed by a transformer. I want to sort on column and in the sort stage i have 'allow duplicates = true'. i have 72 records coming from SFS to the sort stage and then from sort only 55 appears to be going to the next stage. this is what i saw through performance statistics. the column i am sorting is varchar 255 with nullability as NO. However the column contains both character and number data and also alpha numericals. I don't think that's the problem because the job works fine for other files with similar kind of data.
My concern is why this 17 records are getting dropped or are they silently passing? After sort and transformer i have the odbc connector stage there also 55 records are getting inserted and the job aborts saying that nulls cannot be inserted into the database table. However the column is nullable NO. How can nulls enter?
Please let me know in case more clarification required. looking forward for your responses.
Thanks
ETL DEVELOPER
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
-
- Participant
- Posts: 53
- Joined: Fri Mar 07, 2008 1:17 am
-
- Participant
- Posts: 3337
- Joined: Mon Jan 17, 2005 4:49 am
- Location: United Kingdom
Did you check your Director log for the job run? It is possible that you can locate them there.
Do you have any empty Varchar?
Include a copy stage before your sort and write the same into another sequential file.
Do the same after aggregator. This will give you an idea of rows passing.
Btw, scope of null check is not identical in PX in all stages. So you may have to test them in pieces before assembling them back.
Do you have any empty Varchar?
Include a copy stage before your sort and write the same into another sequential file.
Do the same after aggregator. This will give you an idea of rows passing.
Btw, scope of null check is not identical in PX in all stages. So you may have to test them in pieces before assembling them back.