Transformer Stage Functions

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

vick
Participant
Posts: 88
Joined: Sun Oct 30, 2005 2:06 am

Transformer Stage Functions

Post by vick »

Hello people,

I have designed a job that has

[code]

Oracle_Enterprise_stage -----> Tfr ------> Sequential_File_stage

[/code]

I have 20 columns in the source and same number of columns in the target.

I have to poulate a part of the string, from one of the columns say [b]"A"[/b] into another column say [b]"B"[/b]. The length of this string varies from record to record.

I have used a "FIELD" function in the transformer but not able to pull what I want from the string.

Field(DSLink_ColA, " ", "V-", 1)

[b]Data[/b] -

Col A

"3.2 L 24 tspa description V-8"

I need "V-8" in Col B.

Is there a way I can just pull V-8 into column B

Appreciate any pointers!

TIA
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

When you say you "need V-8" from that example field, what exactly do you mean? You need the last 'word' from the string? The sixth 'word'? The last three characters? Be more specific, please.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vick
Participant
Posts: 88
Joined: Sun Oct 30, 2005 2:06 am

Post by vick »

Thanks Craig!

Actually the "V-%" is in different position in different records but majority of them are the last 3 characters.

vick
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

So you want to extract V- and whatever single character is with it? Is that correct. Give us a few examples, this is confusing.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
vick
Participant
Posts: 88
Joined: Sun Oct 30, 2005 2:06 am

Post by vick »

Yes your right DSguru2B!

The data looks something like this

[code]

ColA
5.0 L TSpark 24 V-8
3.0 L TSpark 24 V-12
3.2 L V-14 TSpark 24
1.8 L V-6 TSpark 24
5.0 L TSpark 24 V8
2.0 L TSpark 24 V6

[/code]

I want to grab "V-8", "V-12", "V6" so on and so forth

Appreciate your responses
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

See if this helps

Code: Select all

Field(in.Col[INDEX(in.Col,"V",1), Len(in.Col) - INDEX(in.Col,"V",1)+1]," ",1)
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

If the number of characters that you need after "V-" doesn't exceed 2.
That is if the digits after "V-" are always one or two digits, ie., 8,16,6,12,14....etc...
Then, you can use:

Code: Select all

Trim(Arg1[Index(Arg1,"V-",1),4])
Whale.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
vick
Participant
Posts: 88
Joined: Sun Oct 30, 2005 2:06 am

Post by vick »

Thanks much.

Here's how the data looks. The string has different formats namely

1. "5.5 L T-spark 24-Valve"
2. "24-Valve Al V-8"
3. "3.5 L V6 24-Valve"
4. "24-Valve Al V12"
5. "5.5 L Twin-Spark"

I want to populate "V-?" (V-numeric say V-6 or V-8 or V-12) I want to populate "Null" for 1st and 5th formats.

If I use the following logic I m unable to populate correct value for the 3rd format as the "V6" occurrence is 1.


[code]

If Index(DSLink2.ENGINE_DESCRIPTION,"V-",2) Then Right(DSLink2.ENGINE_DESCRIPTION,4) Else (If Index(DSLink2.ENGINE_DESCRIPTION,"V",2) Then Right(DSLink2.ENGINE_DESCRIPTION,3) Else 'Null')

[/code]

I was wondering if all the formats can be captured in a single logic.
Am I right in using transformer?
Any pointers pl?

TIA
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Because some of the Vn descriptions are embedded (for example row #3) this is a non-trivial task. Do you want to handle V8 as well as V-8 (and, if so, do you want to handle them differently)?

This task is easier with a BASIC Transformer stage using a MatchField function. You could, of course, create a parallel routine or Build stage to do this work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vick
Participant
Posts: 88
Joined: Sun Oct 30, 2005 2:06 am

Post by vick »

Thanks Ray for your inputs.

I want to handle them in a "single" format "V-numeric".

The string may have different format for for different records but I want to pull only the part with "V" and the numeric and format it later to "V-?".

TIA

V
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

But you wont be able to do that in a single line as the incoming data is not consistent. Index on "V" will have a value even if it gets "Valve" which is also present in your data. Index on "V-" will skip values like V8 or V12. Thats why Ray indicated that this can easily be done in a Basic transformer due to the presence of MatchField() function and also with "Matches". If you have to do this in px you need to write your own parsing routine in C.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In a BASIC Transformer stage only (or in a server job):

Code: Select all

If InLink.Product Matches "0X'V'1N0N0X" Then MatchField(InLink.Product, "0X'V'0N0X", 3) Else If InLink.Product Matches "0X'V-'1N0N0X" Then MatchField(InLink.Product, "0X'V-'0N0X", 3) Else @NULL
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vick
Participant
Posts: 88
Joined: Sun Oct 30, 2005 2:06 am

Post by vick »

Thanks a ton Ray!

I plugged in your code and concatenated with "V-" and it works great except for 5 & 6. (Values after ----------------> are the results)


[code]
1. "3.5 L V6 24-Valve DOHC" ---------------------> V-6
2. "3.8 L V-6 24-Valve DOHC" ---------------------> V-6
3. "4.6 L V8 w/4 valve per CYL" -------------------> V-8
4. "5.5 L 32-Valve Aluminum V-12"----------------> V-12
5. "5.9 L V-12 32-Valve Aluminum"----------------> V-
6. "3.9 L V6 32-Valve Aluminum"------------------> V-
7. "AMG 5,439-cc 24-Valve Supercharged"-------> Null

[/code]

How do you account for when the data is similar to 5 & 6 (When V6 or V-12 comes in before the word "Valve")

Any suugestions?

TIA
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

If you are using a server job or a basic transformer then do this

Code: Select all

If (in.Col Matches "...V-0N1N..." OR in.Col Matches "...V0N1N...") then Field(EREPLACE(in.Col, "Valve", "")[INDEX(EREPLACE(in.Col, "Valve", ""),"V",1), Len(EREPLACE(in.Col, "Valve", "")) - INDEX(EREPLACE(in.Col, "Valve", ""),"V",1)+1]," ",1) else @NULL
There is a lot going on there. Might even have some effect on your performance. I sure hope your data volume is low.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
vick
Participant
Posts: 88
Joined: Sun Oct 30, 2005 2:06 am

Post by vick »

Thanks! The no of records is 6056 but although the canvas on DS Designer shows 6056 if I do a wc -l <filename> it returns only 5104 records. It does not pull @null rows.

I have played around with the BASIC TFR stage with 6 different logics including a routine but I m not able to figure out the reason why is does'nt populate null values. When I test the routine on the DS Manager it gives the @null values.

Appreciate any suggestions

TIA
Post Reply