We have discovered a bug/flaw in how DataStage loads duplicates into a target multiset table. I am wondering if anyone else has noticed this and if so, if you figured out a way around it or if there are any patches available that could fix this. And yes, we are currently working with IBM... but they are pushing to upgrade to v8 and we don't have time, $ or bandwidth to do this yet.
Here is the problem: The source data has duplicates that need to be loaded into the target table. The table is defined as multiset to allow the duplicates to be loaded (tables designated as SET will not allow dups, MULTISET will). What we have seen is that the duplicates are dropped automatically as the target table is loaded, with no notification of the dropped records.
According to IBM and Teradata the issue is a combination of things. Here's how it all works:
- * DataStage uses FastLoad to write data to Teradata tables
* FastLoad requires the target table to be empty.
* To ensure an empty table, DataStage creates a "ghost" table that is a DDL copy of the actual target.
* The data is loaded to the ghost table, the copied (via SQL) to the target table.
Does anyone know of a way to force the ghost table to be created correctly (SET if target is SET, MULTISET if target is MULTISET)?
For now, we are just adding a record counter/surrogate key to keep the records unique. What other work arounds have poeple used? I'd rather not have to add a "useless" key that has to be dropped later on, especially when the volumes are high - one column can add up to a lot of unnecessary space waste.
Thanks!
Brad.