Distinct of a query not retrieving correct result

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Abhyankar
Participant
Posts: 15
Joined: Thu Jan 01, 2009 11:04 pm

Distinct of a query not retrieving correct result

Post by Abhyankar »

Hi,
I did a distinct on a single table on User id. I am still getting duplicate user_ids. Not sure why. (I am selecting only User id )
Even when i trim the spaces off and do a distinct, it is giving me duplicate rows.
shalini11
Participant
Posts: 74
Joined: Thu Jan 22, 2009 3:00 am

Post by shalini11 »

Can you post the query you are using.
Abhyankar
Participant
Posts: 15
Joined: Thu Jan 01, 2009 11:04 pm

Post by Abhyankar »

Hi i am just doing a
select distinct <column_name> from <Table> in DB2.

I am getting few column names more than once.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Assuming that the data type is string, check that the "same" values don't have the same number of leading or trailing space characters.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
adi_2_chaos
Participant
Posts: 96
Joined: Fri Apr 17, 2009 5:58 am
Location: Delhi

Post by adi_2_chaos »

Check if there arent any junk chars in the column. In case there are any, then TRIM wont help
Abhyankar
Participant
Posts: 15
Joined: Thu Jan 01, 2009 11:04 pm

Post by Abhyankar »

Hi,
Yes, it is a char 15 column.
I did a ltrim as well as rtrim on that column.
Also, i checked the length of it..
Basically i did a
"select length(ltrim(rtrim(<column_name>))) from table
where <column_name> like '%000051%'"

Surprisingly it gave me two values 15 and 6. Now the value is 6 so it should give 6, but why is 15 coming? Is it garbage value? How can i remove that?
Abhyankar
Participant
Posts: 15
Joined: Thu Jan 01, 2009 11:04 pm

Post by Abhyankar »

How can i check for junk char values?
adi_2_chaos
Participant
Posts: 96
Joined: Fri Apr 17, 2009 5:58 am
Location: Delhi

Post by adi_2_chaos »

A very simple way is to copy paste a sample ouput to some text editor.
Lets say
'ABC ' is ur output.
In case the trailing characters are white spaces, they'll be inevitably removed by TRIM function. If not, then they're junk chars for sure.

I'm not sure how u can remove junk chars with any fucntion. but i got a vague idea.
Lets say u have two values ....
'ABCDEF' and
'ABCDEF ' (padded with 15 chars).
And you're sure that the former one is correct, then simply update the rest of them usign substr function. The residual condition should be all the rows where the length of that particular column is not equal to 6.
Lets hope this works.
Abhyankar
Participant
Posts: 15
Joined: Thu Jan 01, 2009 11:04 pm

Post by Abhyankar »

Thanks, but just wanted to know why this happened?
Is it because the column length is character 15 and my column value is just 6?
Actually, i populate this table through Datastage. I am doign a substring in my transformer and take this column value.
I run this Job every tuesday. It never happened till now, but somehow happened yesterday!
adi_2_chaos
Participant
Posts: 96
Joined: Fri Apr 17, 2009 5:58 am
Location: Delhi

Post by adi_2_chaos »

Even I'm relatively new to Datastage. Cant suggest a lot of things right now. If you can dig down into things, start with the source data .. maybe there's something wrong with it.

Thanks.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Don't think copy and paste won't do much good if they are non-printable characters.

Try select <column>, hex(<column>) from <table> where <column> like %00051%...

What results do you get? At a guess one has been padded with nulls
Abhyankar
Participant
Posts: 15
Joined: Thu Jan 01, 2009 11:04 pm

Post by Abhyankar »

Hi,
The value for '000051'
came as '303030303531000000000000000000' and '303030303531202020202020202020'

Seems the second is padded with space. Thanks. But then why am i not getting the correct result when i am doing a ltrim and rtrim?
Abhyankar
Participant
Posts: 15
Joined: Thu Jan 01, 2009 11:04 pm

Post by Abhyankar »

Hi guys,
The problem has been resolved.
It was padding with space issue itself. Dont know why ltrim and rtrim didnt work, but when i did 'REPLACE(Some_Field, ' ', '')' it gave me proper distinct records.
Thanks a lot to all!
datastage_learner
Participant
Posts: 15
Joined: Thu May 07, 2009 9:50 pm

Post by datastage_learner »

Abhyankar wrote:Hi guys,
The problem has been resolved.
It was padding with space issue itself. Dont know why ltrim and rtrim didnt work, but when i did 'REPLACE(Some_Field, ' ', '')' it gave me proper distinct records.
Thanks a lot to all!


ecjjactly...mate...when you have distinct issues you have improper sql formation issues too. take care next time :roll: :wink:

Also...mate when u enter the username values in the table before retrieving it, can't you trim the data first before inserting in the table?

Secondly, char datatype tends to do such thing...use varchar2 henceforth! :roll:
Thanks,
DS_Learner
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Please strive for a professional standard of written English on DSXchange. Put yourself in the position of someone whose first language is not English wasting time looking up "ecjjactly" in a translation dictionary.

And please refrain from frequent use of ...mate... - it does not contribute anything.

Finally, as noted elsewhere, the second person personal pronoun in English is spelled "you", not "u".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply