How to split Array in datastage?

pkll · Post by **pkll** » Fri Mar 08, 2013 5:36 am

Hi,
I Have Source file Like,

I/P
-----
custname,custno
KIRAN,[323~96.949900~7.210000][384~120.000000~0.000000][443~0.000000~0.000000][463~0.000000~0.000000][763~0.000000~0.000000][783~0.000000~0.000000][704~0.000000~0.000000]
i have morethan more than 100 arrays.
I want output like o/p
----
custname,custno,amount,balance
KIRAN,323,96.949900,7.210000
KIRAN,384,120.000000,0.000000
---
---
---
like this

Could you plese help me how to split this array?

chulett · Post by **chulett** » Fri Mar 08, 2013 8:04 am

First let me get you into your own post and in the proper forum. No idea why you decided to jump on the end of this post but I've split yours out so you have control over the fate of the topic. Done!

Let me know if you're not on an 8.x release on a UNIX server.

rameshrr3 · Post by **rameshrr3** » Fri Mar 08, 2013 11:18 am

Column 2 in your input looks like a variable length ""Vector of Subrecords."
You can use restructure stages . Let me know what you data source is.

I have done similar work using "Promote Subrecord " stage
Convert a variable length vector of subrecords to subrecords - in fact the structure had level 02 and level 03 , all were promoted to top level using 2 promote subrecord stages in cascade ( with copy stage for debug in between) .

The main stumbling block for you may be making datastage ***recognize*** your input column as a vector of subrecords.

--Vesion 8.5 and above only :
Another option is transformer stage Looping with version 8.5 and above . You would need to know number of subrecords for each input record and use this in the next transformer to use loop variables ( put a copy stage between 2 transformers to do debugging ) .

sudha03_vpr · Post by **sudha03_vpr** » Fri Mar 08, 2013 11:34 am

Alternatively, you can use Complex Flat file that supports multiple records. Which system does generate this file ?

rameshrr3 · Post by **rameshrr3** » Fri Mar 08, 2013 4:14 pm

To read it as a vector of subrecords from a file, You MUST do away with the enclosing brackets '[' or ']' , and add another column to your schema and define it as a link field for variable length vector - which will store the count of beginning brackets or closing brackets - this will effectively be the number of elements in the vector, and tag this link field as such.
If source cannot change format , use column import to read the entire second column as a long text and then do the rest in a transformer.
For the vector column , define the link field with what ever name you gave . In the next transformer replace closing brackets ']' with a delimiter ( '~') so that all fields of your subrecord have a trailing delimiter. Define your subrecord columns ( subcol1,subcol2,subcol3) as level 02. and the Vector column must be declared as UNKNOWN datatype with Variable checkbox set.
Use this as input to the Promote subrecord stage and list the vector column as the subrecord data column. Output link shoudl include Name, Subcol1,Subcol2 and Subcol3 , with no level set. It should work fine.

prasson_ibm · Post by **prasson_ibm** » Tue Mar 12, 2013 3:48 am

Hi,

If you are in version 8.5 and above,it can be done easily.

Your job flow will be like this:-

Seqfile----> TSFM1----TSFM2----->Output

Logic in TSFM1:-

StageVar=Dcount(DSLink2.custno,"][")
Loop Condition:-@ITERATION<=StageVar

Output Derivation :-

Code: Select all

Custname=Inputlink.Custname
Custno=If(@ITERATION=1) Then Right(Field(DSLink2.custno,"][", @ITERATION),Index(Field(DSLink2.custno,"][", @ITERATION),"[",1)+1) Else If (@ITERATION=7) Then Left(Field(DSLink2.custno,"][", @ITERATION),Index(Field(DSLink2.custno,"][", @ITERATION),"]",1)-1) Else Field(DSLink2.custno,"][", @ITERATION)

Logic In TSFM2:-

Svar=Dcount(Dcount(DSLink27.CUSTNO,"~")

Loop Condition:-

@ITERATION<=Svar

Derivation:-

CUSTNAME=Inputlink.Custiname
Custno=StringToDecimal(Field(DSLink27.CUSTNO,"~",@ITERATION))

pkll · Post by **pkll** » Mon Mar 25, 2013 1:29 am

Hi Prasson,

Thanks for ur reply.It is working fine.
But,From Transformer to Transformer Stage i am getting 31000 records per second.
From Sequential File to Transformer I am getting 1500 records per second only .Atleast i need to reach 10000 records per second.

I have used 4 Node configuration , i used partition techniques,i used $APT_MONITOR_TIME(value=5),$APT_MONITOR_SIZE(value=100000) environment variables,in target Oracle connector stage i increse the array size,In source sequential stage i have used number of records per node=4.

Could you please help me how to get more performance(10000 records per second) from source sequential file to Transformer stage?

Regards,
pkll..

ray.wurlod · Post by **ray.wurlod** » Mon Mar 25, 2013 4:23 am

Start by ceasing to believe that rows/second is in any way meaningful. It includes all wait time (while the stages are starting up), it is rounded to the nearest sampling interval, and unless row sizes are identical you're not comparing apples with apples anyway.

What's so magic about the 10000 figure?

Use DataStage Director's Monitor view to determine how equitably your records are split over nodes. What processing takes place within the Transformer stages that needs two of them? Can you combine the logic into a single Transformer stage?

rameshrr3 · Post by **rameshrr3** » Mon Mar 25, 2013 1:04 pm

The TFM 1 in the logic you use needs read ahead buffering. the Output link is much faster as your are doing row splits and incresing number of rows due to logic . If performance is an issue , you can still go with promote subrecord stage - trust me it works. Use a tr -d command to remove your operning and closing brackets '[]' before your read the file and define the subrecord delimiter as '~' . Try running your job with APT_DISABLE_COMBINATION = TRUE and see if your rows/sec change.