duplicate record for key column in ODBC stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
san_deep
Participant
Posts: 33
Joined: Wed Sep 27, 2006 6:10 am
Location: india

duplicate record for key column in ODBC stage

Post by san_deep »

Hi All,

I am facing a problem while using ODBC stage in my job.
i am getting duplicate records while retrieving the data through ODBC stage for a primary key column in database.
In data base the column is the key column and i am checking the key option in ODBC stage.
I am getting the same value of the column after 14 records,and the issue is same when i am changing the order of the column also in ODBC stage.

Waiting for some useful inputs as i have to revert back to client with probable problem and soluntion for that ie. whether its a stage problem or Schema problem.

with regards,
sandeep
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

IMPORT the table definition and use that. Chances are that you've missed one key column, or more than one.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
san_deep
Participant
Posts: 33
Joined: Wed Sep 27, 2006 6:10 am
Location: india

Post by san_deep »

i tried with importing the table definition and it is still gigving the same problem of duplicates.
My table has only one primary key.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

By definition duplicates are impossible in a primary key. Execute the following query in the source table:

Code: Select all

SELECT Key, COUNT(*) FROM tablename GROUP BY Key HAVING COUNT(*) > 1;
If this returns any rows, castigate your DBA for failing to create a UNIQUE index on the primary key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
san_deep
Participant
Posts: 33
Joined: Wed Sep 27, 2006 6:10 am
Location: india

Post by san_deep »

Ray, Thank yiu for your response. When I am executing the query in the database I am not getting any rows, meaning there are no duplicates in the table. But then, the odbc stage is giving me the same value for the PK after 40 rows, other column values are changing.

I dont know why this is happening, any thoughts?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Odd. Are you selecting from more than one table by chance? Can you post your SQL?
-craig

"You can never have too many knives" -- Logan Nine Fingers
san_deep
Participant
Posts: 33
Joined: Wed Sep 27, 2006 6:10 am
Location: india

Post by san_deep »

SELECT
CLMNT_ID,
SBSCR_ID,
CLMNT_FST_NM,
CLMNT_LST_NM,
DEPN_NBR,
TO_CHAR(CLMNT_BTH_DT,'YYYYMMDD') CLMNT_BTH_DT,
GDR_CD,
CLMNT_REL_CD,
DATA_SOURCE_CD,
MIDL_NM,
INDV_ID,
MBR_ID
FROM
Tablename

the above is the SQL query i am giving through User defined sql in ODBC stage.
CLMNT_ID is the KEy column in the database, but ODBC is giving me same duplicate records after around 40 distinct records.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

As noted, I don't see how in the heck that is possible. What other stages are you using in the job? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

What happens when you use generated sql?
Also try this: Define two columns key, count and execute Rays query in the odbc stage. What do you see?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you using Entire partitioning, by any chance?

Even if you're not, the ODBC Enterprise stage emulates parallelism by creating as many connections to the data source as there are partitions for it to run in. This explains the duplicates; you are running the same query on each connection.

Change the execution mode of the ODBC Enterprise stage to sequential and/or the partitioning algorithm (on the input of the downstream parallel stage) to Hash on the key.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
san_deep
Participant
Posts: 33
Joined: Wed Sep 27, 2006 6:10 am
Location: india

Post by san_deep »

thanks for all iputs you have given.
i was able to resolve this issue by converting that to char by TO_CHAR in the user defined sql.
I don't know why it behaving like that, but i am bein gable to get the required result.
thnaks a lot to all of you.
san_deep
Participant
Posts: 33
Joined: Wed Sep 27, 2006 6:10 am
Location: india

Post by san_deep »

thanks for all the inputs you have given.
i was able to resolve this issue by converting that to char by TO_CHAR in the user defined sql.
I don't know why it behaving like that, but i am bein gable to get the required result.
thnaks a lot to all of you.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Were you perchance using Entire partitioning? That will definitely cause duplicates.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply