Lookup not producing expected results

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
BradJoss
Premium Member
Premium Member
Posts: 10
Joined: Thu Jun 04, 2009 9:37 am
Location: Winnipeg, Manitoba, Canada

Lookup not producing expected results

Post by BradJoss »

Hi All,

I dealing with an issue that's not making sense to me and I was hoping someone could shed some light.

In a previous job I extract lookup data (172k rec) which contains 2 fields (1 decimal and 1 Varchar(30)) from an Oracle DB and place it into a dataset which has been partitioned with Entire.

In the job in question I have a dataset (Sourced from Netezza and hash partitioned on a different column) which contains a Varchar(20) field that I want to use in my lookup for equality on the Varchar(30) from the Oracle extract dataset. I have 7 out of 7 source test records in which I know the field in question is identical and should flow through the stream link but are instead going to the reject link.

I've already tried to ensure that both fields are identical by extracting the data using TRIM and SUBSTR functions on both Sources and changed the length of the Oracle extract column to 20 but I'm still facing the same result.
My underrstanding of how the lookup stage works is that the data does not need to be sorted becuse all the data is placed into memory in order to perform the lookup.

Also note: I'm dealing with 100+Million source records and do not want to have any Transformers in my job.

Any help or insight would be appriciated.

Thanks,
Brad
manoj_23sakthi
Participant
Posts: 47
Joined: Tue Feb 23, 2010 12:16 am
Location: CHENNAI

Re: Lookup not producing expeceted results

Post by manoj_23sakthi »

Hi,
Better we go for join stage..
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Brad,

first I would start and not partition the dataset "entire", just make sure that the reference to the lookup has "entire" specified, that makes it possible to use different APT_CONFIG files since the entire partitioning happens dynamically at runtime.

I've found that often my unexplained lookup issues were caused by padding and spaces; by ensuring both sets of keys are TRIMmed this problem tends to go away. This is particularly true when dealing with ORACLE, as varchar2 automatically trims trailing spaces, so despite what you posted I feel that this is the cause of the problem.

Do the text columns contain special characters or ones that outside of the standard LATIN-1 character set?
ankursaxena.2003
Participant
Posts: 96
Joined: Mon May 14, 2012 1:30 pm

Post by ankursaxena.2003 »

And also try to see the file in Notepad++ or textpad. I think there must be some weird character which you might not be able to see in unix or Notepad.

Can you show the Test Data and Test LKP.
BradJoss
Premium Member
Premium Member
Posts: 10
Joined: Thu Jun 04, 2009 9:37 am
Location: Winnipeg, Manitoba, Canada

Post by BradJoss »

Thanks everyone for the reply's.

I was working on a few of the suggestions made here, specifically taking the output and copying them to a sequential stage to see if any extra charaters were encountered. We did note that the Netezza field had a 2 byte length component but in the end it was not causing the issue. We also tried using DB fucntions to manipulate the output from both Netezza and Oracle to conform the data but this also didn't solve our problem.

After making some modifications such as
1. saving the table definition from the source data set, deleting the columns then loading the table definition back in and
2. Originally I had a modify stage immediately after the Netezza dataset in which I modified the column in question using the format -> newColName:string[20] = ColName which I switched to KEEP and placed a copy stage after to change the column name.
The lookup is behaving as expected.

I'm not sure if these 2 things infact corrected the issue or not. We have seen unexpected behavior before with matadata mismatch between our RMDBS metadata repository and the UNIVERSE metadata repository so did change 1 correct the issue or was it the modify stage.

Also, manoj_23sakthi, I need to use the Lookup stage because I need to capture the rejects and persist them but thanks for the suggestions.
Post Reply