Distinct of a query not retrieving correct result

Abhyankar · Post by **Abhyankar** » Tue May 05, 2009 5:37 am

Hi,
I did a distinct on a single table on User id. I am still getting duplicate user_ids. Not sure why. (I am selecting only User id )
Even when i trim the spaces off and do a distinct, it is giving me duplicate rows.

shalini11 · Post by **shalini11** » Tue May 05, 2009 6:07 am

Can you post the query you are using.

Abhyankar · Post by **Abhyankar** » Tue May 05, 2009 10:58 pm

Hi i am just doing a
select distinct <column_name> from <Table> in DB2.

I am getting few column names more than once.

ray.wurlod · Post by **ray.wurlod** » Tue May 05, 2009 11:32 pm

Assuming that the data type is string, check that the "same" values don't have the same number of leading or trailing space characters.

adi_2_chaos · Post by **adi_2_chaos** » Wed May 06, 2009 12:06 am

Check if there arent any junk chars in the column. In case there are any, then TRIM wont help

Abhyankar · Post by **Abhyankar** » Wed May 06, 2009 12:10 am

Hi,
Yes, it is a char 15 column.
I did a ltrim as well as rtrim on that column.
Also, i checked the length of it..
Basically i did a
"select length(ltrim(rtrim(<column_name>))) from table
where <column_name> like '%000051%'"

Surprisingly it gave me two values 15 and 6. Now the value is 6 so it should give 6, but why is 15 coming? Is it garbage value? How can i remove that?

Abhyankar · Post by **Abhyankar** » Wed May 06, 2009 12:11 am

How can i check for junk char values?

adi_2_chaos · Post by **adi_2_chaos** » Wed May 06, 2009 12:28 am

A very simple way is to copy paste a sample ouput to some text editor.
Lets say
'ABC ' is ur output.
In case the trailing characters are white spaces, they'll be inevitably removed by TRIM function. If not, then they're junk chars for sure.

I'm not sure how u can remove junk chars with any fucntion. but i got a vague idea.
Lets say u have two values ....
'ABCDEF' and
'ABCDEF ' (padded with 15 chars).
And you're sure that the former one is correct, then simply update the rest of them usign substr function. The residual condition should be all the rows where the length of that particular column is not equal to 6.
Lets hope this works.

Abhyankar · Post by **Abhyankar** » Wed May 06, 2009 12:39 am

Thanks, but just wanted to know why this happened?
Is it because the column length is character 15 and my column value is just 6?
Actually, i populate this table through Datastage. I am doign a substring in my transformer and take this column value.
I run this Job every tuesday. It never happened till now, but somehow happened yesterday!

adi_2_chaos · Post by **adi_2_chaos** » Wed May 06, 2009 12:44 am

Even I'm relatively new to Datastage. Cant suggest a lot of things right now. If you can dig down into things, start with the source data .. maybe there's something wrong with it.

Thanks.

Kryt0n · Post by **Kryt0n** » Wed May 06, 2009 1:01 am

Don't think copy and paste won't do much good if they are non-printable characters.

Try select <column>, hex(<column>) from <table> where <column> like %00051%...

What results do you get? At a guess one has been padded with nulls

Abhyankar · Post by **Abhyankar** » Wed May 06, 2009 3:27 am

Hi,
The value for '000051'
came as '303030303531000000000000000000' and '303030303531202020202020202020'

Seems the second is padded with space. Thanks. But then why am i not getting the correct result when i am doing a ltrim and rtrim?

Abhyankar · Post by **Abhyankar** » Wed May 06, 2009 3:48 am

Hi guys,
The problem has been resolved.
It was padding with space issue itself. Dont know why ltrim and rtrim didnt work, but when i did 'REPLACE(Some_Field, ' ', '')' it gave me proper distinct records.
Thanks a lot to all!

datastage_learner · Post by **datastage_learner** » Thu May 07, 2009 10:51 pm

Abhyankar wrote:Hi guys,
The problem has been resolved.
It was padding with space issue itself. Dont know why ltrim and rtrim didnt work, but when i did 'REPLACE(Some_Field, ' ', '')' it gave me proper distinct records.
Thanks a lot to all!

ecjjactly...mate...when you have distinct issues you have improper sql formation issues too. take care next time

Also...mate when u enter the username values in the table before retrieving it, can't you trim the data first before inserting in the table?

Secondly, char datatype tends to do such thing...use varchar2 henceforth!

ray.wurlod · Post by **ray.wurlod** » Thu May 07, 2009 11:45 pm

Please strive for a professional standard of written English on DSXchange. Put yourself in the position of someone whose first language is not English wasting time looking up "ecjjactly" in a translation dictionary.

And please refrain from frequent use of ...mate... - it does not contribute anything.

Finally, as noted elsewhere, the second person personal pronoun in English is spelled "you", not "u".