Lookup not producing expected results

BradJoss · Post by **BradJoss** » Wed Jul 11, 2012 10:56 am

Hi All,

I dealing with an issue that's not making sense to me and I was hoping someone could shed some light.

In a previous job I extract lookup data (172k rec) which contains 2 fields (1 decimal and 1 Varchar(30)) from an Oracle DB and place it into a dataset which has been partitioned with Entire.

In the job in question I have a dataset (Sourced from Netezza and hash partitioned on a different column) which contains a Varchar(20) field that I want to use in my lookup for equality on the Varchar(30) from the Oracle extract dataset. I have 7 out of 7 source test records in which I know the field in question is identical and should flow through the stream link but are instead going to the reject link.

I've already tried to ensure that both fields are identical by extracting the data using TRIM and SUBSTR functions on both Sources and changed the length of the Oracle extract column to 20 but I'm still facing the same result.
My underrstanding of how the lookup stage works is that the data does not need to be sorted becuse all the data is placed into memory in order to perform the lookup.

Also note: I'm dealing with 100+Million source records and do not want to have any Transformers in my job.

Any help or insight would be appriciated.

Thanks,
Brad

manoj_23sakthi · Post by **manoj_23sakthi** » Wed Jul 11, 2012 11:09 am

Hi,
Better we go for join stage..

ArndW · Post by **ArndW** » Wed Jul 11, 2012 11:11 am

Brad,

first I would start and not partition the dataset "entire", just make sure that the reference to the lookup has "entire" specified, that makes it possible to use different APT_CONFIG files since the entire partitioning happens dynamically at runtime.

I've found that often my unexplained lookup issues were caused by padding and spaces; by ensuring both sets of keys are TRIMmed this problem tends to go away. This is particularly true when dealing with ORACLE, as varchar2 automatically trims trailing spaces, so despite what you posted I feel that this is the cause of the problem.

Do the text columns contain special characters or ones that outside of the standard LATIN-1 character set?

ankursaxena.2003 · Post by **ankursaxena.2003** » Wed Jul 11, 2012 11:13 am

And also try to see the file in Notepad++ or textpad. I think there must be some weird character which you might not be able to see in unix or Notepad.

Can you show the Test Data and Test LKP.

BradJoss · Post by **BradJoss** » Thu Jul 12, 2012 10:19 am

Thanks everyone for the reply's.

I was working on a few of the suggestions made here, specifically taking the output and copying them to a sequential stage to see if any extra charaters were encountered. We did note that the Netezza field had a 2 byte length component but in the end it was not causing the issue. We also tried using DB fucntions to manipulate the output from both Netezza and Oracle to conform the data but this also didn't solve our problem.

After making some modifications such as
1. saving the table definition from the source data set, deleting the columns then loading the table definition back in and
2. Originally I had a modify stage immediately after the Netezza dataset in which I modified the column in question using the format -> newColName:string[20] = ColName which I switched to KEEP and placed a copy stage after to change the column name.
The lookup is behaving as expected.

I'm not sure if these 2 things infact corrected the issue or not. We have seen unexpected behavior before with matadata mismatch between our RMDBS metadata repository and the UNIVERSE metadata repository so did change 1 correct the issue or was it the modify stage.

Also, manoj_23sakthi, I need to use the Lookup stage because I need to capture the rejects and persist them but thanks for the suggestions.

DSXchange

Lookup not producing expected results

Lookup not producing expected results

Re: Lookup not producing expeceted results