how to sum the number on the csv file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
divine
Premium Member
Premium Member
Posts: 149
Joined: Fri Oct 15, 2004 12:13 am
Location: Toronto,divine_auro@yahoo.co.in
Contact:

how to sum the number on the csv file

Post by divine »

I have a .csv file.
1,10
2,10,20
3,10,20,30
4,10,20,25,35
5,25,35,25,15,40
.
.
.
45,12,13,14,25,...........

I need to insert the records into a table. It has 2 columns. One is Trans_No and another is Total_Amt.

The values should be
Trans_No,Total_Amt
1,10
2,30
3,60
4,90
So on.
Can ay one help me how to sum the numbers on the basis of the data present in the 1st column in the .csv file.
With Regards
Biswajit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Horizontal pivot followed by Aggregation.
-craig

"You can never have too many knives" -- Logan Nine Fingers
divine
Premium Member
Premium Member
Posts: 149
Joined: Fri Oct 15, 2004 12:13 am
Location: Toronto,divine_auro@yahoo.co.in
Contact:

Post by divine »

Hi,
I'll try for that. But here is only problem we do not predict the number of records.
Thanks for the suggestion and I'll update accordingly.
With Regards
Biswajit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The number of records don't matter... do you know the maximum number of columns you'll have?
-craig

"You can never have too many knives" -- Logan Nine Fingers
divine
Premium Member
Premium Member
Posts: 149
Joined: Fri Oct 15, 2004 12:13 am
Location: Toronto,divine_auro@yahoo.co.in
Contact:

Post by divine »

Hi,
It is a dynamic file. Everytime I used to get different number of column. It depends upaon the no. of transaction . If the transaction is 2 than I only get 3 records ie.
2, 10,20.
If multiple transaction than accordingly file will be generate ie.
2.10,20
.
.
.
99, 2,45,23,67,.........
With Regards
Biswajit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Still... any idea what the maximum number could be?
-craig

"You can never have too many knives" -- Logan Nine Fingers
divine
Premium Member
Premium Member
Posts: 149
Joined: Fri Oct 15, 2004 12:13 am
Location: Toronto,divine_auro@yahoo.co.in
Contact:

Post by divine »

HI,
Till now I have seen the file with max. of 26 column. First coulmn contains 25 and rest is the 25 values, all together 26 column.
With Regards
Biswajit
harshada
Premium Member
Premium Member
Posts: 92
Joined: Tue May 29, 2007 8:40 am

Post by harshada »

As can be seen from the file, the first column gives the number of values to be added. Call a routine. A for loop which counts for the number of times of this first column value should work.

If we want to do this in a Transformer, as Craig suggested if we know the maximum value in the first column, can seperate each value and can store each individual value in that many stage variables and then finally use a SvTotal stage variable to sum up these values. Field function should be enough in this case. Read the file in a single column and then seperate the other column values as suggested.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

This is an ideal application for the looping capability that exists in the version 8.5 Transformer stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: how to sum the number on the csv file

Post by SURA »

Hi

It may not be the straight forward way. But you can try something like below!

Use the sequence for looping. Using the DCOUNT function to find how many iteration it need to loop and based on that you can try to do something.

DS User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: how to sum the number on the csv file

Post by ray.wurlod »

SURA wrote:Hi

It may not be the straight forward way. But you can try something like below!

Use the sequence for looping. Using the DCOUNT function to find how many iteration it need to loop and based on that you can try to do something.

DS User
That would be woefully inefficient. The task is to process the contents of a file, which DataStage does (without sequence) very efficiently.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmsq
Premium Member
Premium Member
Posts: 32
Joined: Mon Oct 30, 2006 9:19 pm

Post by bmsq »

I've done similar logic in 7.5 without a pivot in the past. Can't remember the exact logic. If you leave each CSV record as a string, you can use a column import to pack each field into an array (or was it a vector column). Once in this format, you can use a combination of he restructure stages (including sub record) to perform a pivot wihout knowing the column count.

I can post the info later if your still stuck
divine
Premium Member
Premium Member
Posts: 149
Joined: Fri Oct 15, 2004 12:13 am
Location: Toronto,divine_auro@yahoo.co.in
Contact:

Post by divine »

Hi ,
Thanks to all. I have tried to do it without using any loop or routine. But DS can not read the file as the number of coulmn is not same for each row. But as suggested by all of you, I'll try to implement all the suggestion one at a time.Thanks a lot to all of you for the valuable suggestion. I'll keep update everybody.
With Regards
Biswajit
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Read the records as one long string field, you can parse out the individual elements from there especially seeing as how the first 'column' has the total number of columns in it.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

Its a fairly well known problem in datastage - this is an example of horizontal variable pivot ( number of fields to be pivoted cannot be predicted in advance, but of you have a fair idea of the maximum - its pretty simple. Another step i'd recommend is to replace the first Comma with another delimiter , so that the valueset can be read as a seperate column from the code.

use Dcount to get count of delimiters from valueset column , and add 1 to this to get the number of fields in valueset column , derive another column witha hardcoded value of 1 or A to lookup with the output of a row generator that has 2 columns ( one is constant 1 or A to force the lookup to happen , and another is a column that cycles from 1 to MaxDelimiterCount , in increments of 1 ) . FOr Eg - if you assume the valueset will have a maximum of 10 values , MaxDelimiterCount =10 . force lookup to happen with A= A or 1=1, For every row from input file, you will now get 10 rows , with a column having values 1, 2, 3, 4, 5, etc upto 10.
Compare this column with the delimiter count derived in previous step and pass to output only thsoe rows where rownum ( 1,2,3,) etc <= Delimiter COunt for that input record, next use a field() function with valueset column as primary argument , and comma as delimiter , rownum as occurence and 1 as instance to parse the field out . USe suitable constraints on transformer to get only rows in the out put where output of field function is not null , then sort by CODe column and group and aggregate to get your output (this works on older editions of datastage where you cannot loop within a transformer) . I can also send a dsx file if poster requires it :)- cost is negotiable :twisted:
Post Reply