how to deal with multiple header in a same file

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Can you give some sample data:
As in, is the header repeating for each Data1?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Gaurav.Dave
Premium Member
Premium Member
Posts: 62
Joined: Tue Sep 21, 2004 10:24 am
Location: IBM - Chicago Area

Post by Gaurav.Dave »

Thanks for your quick response.

There will be distinct header for each subset.....

example, in a single file, I will be getting

header1 ---------------> "1Q05", "2Q05", "3Q05", "4Q05"
underlying Data1------>
header2----------------->"1Q06", "2Q06", "3Q06", "4Q06"
underlying Data2------->


Data1 & Data2 will be different contains.... and expecting to have records like about 100k rows..

Thanks,
Gaurav
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Is the number of headers going to be static or would they change? and also fi you can provide me a snippet of the orig data.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Gaurav.Dave
Premium Member
Premium Member
Posts: 62
Joined: Tue Sep 21, 2004 10:24 am
Location: IBM - Chicago Area

Post by Gaurav.Dave »

Well, Header records counts will not be fixed...it will be changing....

here are some sample data from the file....


1Q05 2Q05 3Q05 4Q05
06TGT Client Team TELE OO Fed/Exce GMR PUI 804Top Valid Revenue 8.926070 41.575685 12.089471 10.110442
06TGT Client Team TELE OO Fed/Exce GMR GS 804Top Valid Revenue 8.926070 41.575685 12.089471 10.110442

1Q06 2Q06 3Q06 4Q06
0625TGT Op Ident OO Unassigned Fed/Exce GMR 5S CR Leads 686.801830 925.533652 940.060153 1605.605841
0625TGT Op Ident OO Unassigned Fed/Exce GMR CC CR Leads 160.566881 228.354777 106.455053 184.601798
martin
Participant
Posts: 67
Joined: Fri Jul 30, 2004 7:19 am
Location: NewJersy

Post by martin »

Hi,

Read data as single column
Write to 2 output links
Output link1 on constraint do substring Col[1,5] = '06TGT'
Output link2 on constraint do substring Col[1,7] = '0625TGT'
with this you can create 2 seperate files.

Goodluck
Martin
rasi
Participant
Posts: 464
Joined: Fri Oct 25, 2002 1:33 am
Location: Australia, Sydney

Post by rasi »

Hi

I think that '06TGT' & '0625TGT' is not constant. A good approach is to read file sequentially and assign stage variable to identify the pattern of the the header record. Send data to link 1 unless you dedect a new header patter which will send output to Link 2.

This should help you.
Regards
Siva

Listening to the Learned

"The most precious wealth is the wealth acquired by the ear Indeed, of all wealth that wealth is the crown." - Thirukural By Thiruvalluvar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Depending on exactly what you want the results to be, you might consider using the Rejects link in the Sequential File stage to capture the header lines, and process them after converting from raw format (if, indeed, you need to process them at all). This way only detail rows will appear on the main output of the Sequential File stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply