Remove prefix and leading zeros

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
times29
Premium Member
Premium Member
Posts: 202
Joined: Mon Sep 22, 2008 3:47 pm

Remove prefix and leading zeros

Post by times29 »

Hi,
I have column AA1 which looks like
KS10000000200001
KS1000100001
OR00000002000078
OR000000020000789

i want to Remove KS1 + leading 0s and write 200001 and 100001 in target and same for OR00000002000078 Remove OR + leading 0s

Target should be
200001
100001
2000078
20000789
ankursaxena.2003
Participant
Posts: 96
Joined: Mon May 14, 2012 1:30 pm

Post by ankursaxena.2003 »

Is there any pattern for the first few 0's.

For example, Number of 0's before the actual number. Or fixed length of the actual number. Or what is the maximum length of actual number.

Actual number is the number you are trying to extract i.e. 200001, 100001
ankursaxena.2003
Participant
Posts: 96
Joined: Mon May 14, 2012 1:30 pm

Post by ankursaxena.2003 »

You can try to remove the first 3 character and then trim of left 0's by using Convert function or Replace function.
times29
Premium Member
Premium Member
Posts: 202
Joined: Mon Sep 22, 2008 3:47 pm

Post by times29 »

how can i remove first three chracters?
bhasds
Participant
Posts: 79
Joined: Thu May 27, 2010 1:49 am

Post by bhasds »

Hi times29,

You can try this-

Code: Select all

Trim(Field("colname"[1,3] :" " :"colname" [4, Len("colname")-3 ]," ",2),"0","L")
times29
Premium Member
Premium Member
Posts: 202
Joined: Mon Sep 22, 2008 3:47 pm

Post by times29 »

This will take care of KS1 what about OR then
ankursaxena.2003
Participant
Posts: 96
Joined: Mon May 14, 2012 1:30 pm

Post by ankursaxena.2003 »

Then use the following command.

Code: Select all

Trim(column_name, "0","L")
The above command will get read of all the leading 0's after removing first 3 characters.

But, if you have data as KR11000001000001 then it will give you output as 1000001000001. So, you need to make sure that it can never have more than 3 characters in the first 3 places.
ankursaxena.2003
Participant
Posts: 96
Joined: Mon May 14, 2012 1:30 pm

Post by ankursaxena.2003 »

That will take care of OR0 as well. Because any how you will eliminating that 0 when you will do Trim(Column_name,"0","L").
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

If and only if the first zero is always the first leading zero to be removed, you can get the position of it in the string with Index:

Code: Select all

Index(MyString, "0", 1)
If you have non-repeating zeros you want to skip, and you are sure there will always be at least two leading zeros, the search string of "00" can be used.

Step one: extract the string from the index position to the end.
Step two: remove leading zeros.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
times29
Premium Member
Premium Member
Posts: 202
Joined: Mon Sep 22, 2008 3:47 pm

Post by times29 »

The issue is i have some other records too but i just want to strip
KS1 and OR records only

Source

COL1
KS10000000200001
KS1000100001
OR00000002000078
OR000000020000789
NVL00000002000078

COL 1 will be mapped to three columns in target

col1_ks1 col2_or col3_other

200001 20000789 NVL00000002000078
100001 20000789
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Gee, it would have been useful had you mentioned that originally.

Never mind, it doesn't add much complexity. How do you know that the end of a group has occurred?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bhasds
Participant
Posts: 79
Joined: Thu May 27, 2010 1:49 am

Post by bhasds »

Hi times29,

Can you please try the below -

Code: Select all

If col1 [1,3]="KS1" OR col1 [1,2]="OR" Then Trim(col1 [Index(col1,"0","1"),Len(col1)-Index(col1,"0","1")],"0","L") Else col1
I got the desired output with this.
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

Code: Select all

IF link.col1[1,3]="KS1" Then link.col1[4,99]+0 Else If link.col1[1,2]="OR" Then col1[3,99]+0 Else link.col1
This will work as long as the "strip leading zeros" will only ever have to deal with numeric values. A value of OR0123X would break this.
If you have to deal with non-numerics after the leading zeros, then you need to find the first non-zero value. That's quite tricky. I'd say that that is a perfect case for a Parallel Routine that finds the first character that is not in a set of characters and returns the offset. The challeng is that Parallel Routines are difficult to make unicode-compatible. If your data is only ever going to be single-byte ASCII then you can do it easily, search for pxStrFirstCharList for a very similar function. All you'd need to do is reverse the test so it searches for the first character that is not in the list, and just pass in '0' as the character list.

*Update* D'oh, Trim() is exactly the function you need.

Code: Select all

IF link.col1[1,3]="KS1" Then Trim( link.col1[4,99], '0', 'L' ) Else If link.col1[1,2]="OR" Then Trim( link.col1[3,99], '0', 'L' ) Else link.col1
Phil Hibbs | Capgemini
Technical Consultant
Post Reply