Page 1 of 1

Parallel padding nulls for empty space !!

Posted: Fri Mar 02, 2007 3:07 pm
by ady
Hi,

I have a change capture job where i'm comparing data from a parallel job output to a server job output. These data are supposed to match. Every other filed in the data matches perfectly but I have a problem with this one field whose data looks like :

0000050100805721
3437 1
0000059500560060
W 02008522
914880 914880
0000019400905240
7200 4132
00038 00038
1195 1195
0000071100110551
50123 51727
1607 1607
CHAIN04807997
0978478 0978478


The problem here is the column has a length of 24 according to the METADATA but most of them have empty spaces filling up the gap.

I think the parallel job is filling the empty spaces with a NULL or any other character, How can I avoid this.

I am trying to use APT_STRING_PADCHAR but it dosent seem to solve my problem. the value i am using for the ENV variable is "0x20" (space).

Is there any other approach for this , or am I missing something here .... plz help

Posted: Fri Mar 02, 2007 3:12 pm
by DSguru2B
It must be a fixed width file with char fields. It will pad empty character with APT_STRING_PADCHAR. What do you want it to do? You dont want the empty characters? Then it wont be a fixed width file.

Posted: Fri Mar 02, 2007 3:32 pm
by ady
The output is a dataset currently, do I need to use a sequential file there to be able to get APT_STRING_PADCHAR working ???

Posted: Fri Mar 02, 2007 3:35 pm
by DSguru2B
No you dont need to. Change the char's to vharchars and see if you see any difference.

Posted: Fri Mar 02, 2007 4:08 pm
by ady
Dataset with varchars does not work ... :(

Posted: Fri Mar 02, 2007 4:49 pm
by ray.wurlod
Data Set with VarChar DOES work. I have lots of them.

VarChars are not padded. Only Chars are padded.

VarChars can be trimmed prior to comparison. You can use Modify stages for trimming if speed is important.

Posted: Fri Mar 02, 2007 5:23 pm
by ady
Oops sorry if I got you guys confused ...

datasets with VARCHAR work generally , but in my case I was using dataset with varchar but then the padding does not work, if i use char them my trim does not work.

My whole point is ...I need to pad the values SPACE in my dataset or seq file but I also need to trim a few column for comparision ahead in the job , how can I do it without a transformer ?

I'v never done a trim with modify stage , how does that work?

Posted: Fri Mar 02, 2007 7:26 pm
by ray.wurlod
The modify operator (and therefore the Modify stage) supports a string_trim() function.

A fairly complete list of Modify stage functions may be had here and then, of course, there are the manuals.

Posted: Fri Mar 02, 2007 8:12 pm
by DSguru2B
Thats a nice piece of document Ray. Did you prepare it?

Posted: Fri Mar 02, 2007 9:02 pm
by kumar_s
Better you change all you field to Varchar and read it again. Where in most cases, you can expect the Whitespaces to be trimmed automatically. Atlest with the functions provided by Ray. By this you also save the space used by your sequential file if any.

Posted: Sat Mar 03, 2007 2:00 am
by ray.wurlod
DSguru2B wrote:Thats a nice piece of document Ray. Did you prepare it?
Yes, it's part of an EE Transformation Techniques module I've developed for a Parallel Job Techniques class/tutorial. The tutorial as a whole is not ready yet; I will announce its availability and price when it is.

Posted: Sat Mar 03, 2007 2:02 am
by DSguru2B
Please do. I might be interested in getting your transition document. I seems to be very diverse and complete.