what stage to use?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Jay
Participant
Posts: 105
Joined: Tue Nov 11, 2003 8:28 pm

what stage to use?

Post by Jay »

Hi All,

My requirement is this.

Split a column value , value1.value2,
to two separate columns containing value1 and value2.

e.x. Input col1=1234.5678 --> Output col1 = 1234 and Output col2=5678

What stage do i use other than the transformer?

DS ver = 7.5 EE(PX)

Thanks in advance
jay
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I took a look at the functions available to you in a modify stage and there is nothing that will do this for you on a numeric datatype. I would use a transform stage and (if it is a string datatype) using the FIELD function; if it is a decimal field then INT() and ({Original}-INT({Original})*{precision}
Jay
Participant
Posts: 105
Joined: Tue Nov 11, 2003 8:28 pm

Post by Jay »

Thanks. I was not getting any ideas on modify stage either.
Raog
Participant
Posts: 8
Joined: Thu Oct 13, 2005 8:53 am

Re: what stage to use?

Post by Raog »

HI,

U dont require any other stage other than transformer. U only require target to have 2 columns to store the value 1 and value 2.

Thx,
Rao.
Jay wrote:Hi All,

My requirement is this.

Split a column value , value1.value2,
to two separate columns containing value1 and value2.

e.x. Input col1=1234.5678 --> Output col1 = 1234 and Output col2=5678

What stage do i use other than the transformer?

DS ver = 7.5 EE(PX)

Thanks in advance
jay
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Rao,

Transform stages in Px can and should be avoided where possible. That is why Jay asked if it could be done any other way - the modify and copy stages are the most likely candidates; but the INDEX function isn't built into either of them, and their intrinsic function sets are more geared toward type conversions than content manipulations.
track_star
Participant
Posts: 60
Joined: Sat Jan 24, 2004 12:52 pm
Location: Mount Carmel, IL

Post by track_star »

Arnd is 110% correct regarding transformers in PX jobs. Avoid them like the plague if you are expecting any kind of performance out of your job (which is the reason we're all using PX in the first place, right?!?!?). Jay, my suggestion is this:

1. Create a shell script that uses sed and/or awk to parse the string and create the two output values, then create a wrapper in the GUI and wrap that script.

2. Create a simple C++ program in a buildop that uses builtin C++ string manipulation functions to parse the string and let the GUI create all of the necessary files and the stage.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What is the source - text file or database table or other?

What is the data type that contains 1234.5678 ?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Jay
Participant
Posts: 105
Joined: Tue Nov 11, 2003 8:28 pm

Post by Jay »

Thanks for the suggestions. Here are more details.

The source will be an Oracle load table with a max of about 25 million records.

The datatype is string, Varchar2(20), to be exact. Its basically 2 ids in 1 column.

Target is also an Oracle table.But the columns all have their meaningful, respective, individual (etc etc) values.

Now, thinking about it, I can also do a substr() in Oracle stage itself.

Please let me know if there are any other ideas.

thanks
jay
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

Do not know much about the performance aspects (vis-a-vis a transformer) of this, but cant a column import stage be used for this?
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
ThilSe
Participant
Posts: 80
Joined: Thu Jun 09, 2005 7:45 am

Post by ThilSe »

Hi,

You can also try using the "Column Import" stage with "." as a delimitter

Regards
Senthil
Jay
Participant
Posts: 105
Joined: Tue Nov 11, 2003 8:28 pm

Post by Jay »

thanks guys....i can see which one performs better....i'll keep u posted

my environment has not yet been set up....so might just take some time....
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
Since you say the source is in Oracle DB how about user define sql that will split the column in the select?
Check performance it might be good for you performance wise and eliminate C++ programing.

IHTH,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
Jay
Participant
Posts: 105
Joined: Tue Nov 11, 2003 8:28 pm

Post by Jay »

Yes. so performance is of vital importance....

also i will selecting from a load table and inserting into a partitioned table...

already there are problems with sub-partitions...it seems there is a bug with Oracle 9i ver 9.2.0.4.0

if you create subpartitions, then the data dictionary gets corrupted...the DBAs suggested going with list partioning on a composite key....

so lets see how many more hurdles arise....it should be fine as long as they are not too high...
jay
Jay
Participant
Posts: 105
Joined: Tue Nov 11, 2003 8:28 pm

Post by Jay »

Hi all,

I got the job to run. i have not done performance testing yet as i dont have a big enough test file.

In the Oracle stage, in the query i say:

nvl(to_number(substr(ZONE_ID,6,4),'9999'),'9999') as ZONE_CODE

ZONE_ID is varchar(20). But actual value is like 1234.5678.

Column ZONE_CODE is defined as Integer (4) in the Oracle stage.

i get a warning for implicit conversions.

SRC_ORA: When checking operator: When binding output interface field "ZONE_CODE" to field "ZONE_CODE": Implicit conversion; from source type "decimal[38,10]" to result type "int32": Possible range/precision limitation

Any ideas as to what i can do to avoid these warnings. This warning is in the Oracle stage itself.

How do i use the modify stage to do the implicit conversions or is there any other way?

Thanks
jay
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

Hi Jay,

Recommend reading the column as a decimal 38,10 and converting it to an Int32 in a modify stage downstream.
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
Post Reply