Page 1 of 1
Lookup from delimited sequential file
Posted: Mon Mar 06, 2006 7:14 pm
by narasimha
What are the different appraoches that can be implemented to read a sequential file of the format,
Code: Select all
123,1020,1578
2344,5423,4353,2311
412
3213,4231
look up the values in a hashed file and replace with corresponding keys
(I already have a hashed file for this)
- 123 =>1
1020 => 24
1578 => 32
2344 => 52
5423 => 82
4353 => 75
2311 => 48
412 =>12
3213 => 60
4231 => 71
and insert into another table in the format shown below
Code: Select all
Group Detail Key
1 1 1
1 24 2
1 32 3
2 52 4
2 82 5
2 75 6
2 48 7
3 12 8
4 60 9
4 71 10
Some points-
Each number corresponds to a key in the detail column.
The numbers in the Group column unique set of numbers from the sequential file .
Dont worry about the key generation part of it.
Posted: Mon Mar 06, 2006 10:03 pm
by I_Server_Whale
What are the different appraoches that can be implemented to read a sequential file of the format,
Code:
123,1020,1578
2344,5423,4353,2311
412
3213,4231
Hi Narasimha,
I have been working on your issue. And considering the example given by you, I have simulated a job design to read a variable_width file.
Since your example contains a maximum of 4 columns, this sequential file has four columns (c1, c2, c3, c4) in its metadata. By declaring the
'Incomplete Column' option as '
Replace' for all the 4 columns, it was possible to read the variable length file.
For further ease, You can download this pdf for the sequential file settings.
Download
Let me know if it worked out for you,
Thanks,
Naveen.
Posted: Mon Mar 06, 2006 10:27 pm
by narasimha
Naveen Thanks for the effort.
Before we get too far, the number of coulmns is not restricted to a maximum of 4, it could be more.
Posted: Mon Mar 06, 2006 10:35 pm
by I_Server_Whale
Yes! I understand that. You should have the
Code: Select all
no. of columns in sequential file stage = the no. of columns in the longest record
I hope that you will be knowing the maximum no. of columns that a record can have.
Let me know,
Thanks,
Naveen.
Posted: Mon Mar 06, 2006 10:53 pm
by narasimha
Unfortunately it is not known as of now
This group of numbers is formed
dynamically.
I have to assume some number - 20. Does that help
But I would prefer something dynamic
Posted: Mon Mar 06, 2006 10:59 pm
by I_Server_Whale
It depends on your file. If you think that a record in the source file can never have more than 20 columns
![Wink :wink:](./images/smilies/icon_wink.gif)
. Then, you should be good to go.
Do let me know if you were able to finish your process succesfully.
Thanks,
Naveen.
Posted: Mon Mar 06, 2006 11:49 pm
by ameyvaidya
I was going through the Plug-in Documentation of the Row Splitter Stage and found that it has a capability of generating the desired splitting up of multi-value columns into seperate rows. I could not try this.. (No access to DataStage). but the way to go would be
Code: Select all
Src---------->Transformer------->RowSplitter------->
In the Transformer, replace all commas to "char(10)" which is a newline character.
In the row Splitter general Page, Enable
Multiple Lines. This determines whether Row Splitter deconcatenates the input string into separate output rows, or whether it outputs each input string as a separate output row. Select Multiple Lines to have the rows deconcatenated. By default it is not selected.
And set Line termination to Unix.
Finally this is an
untested theory
Let me know if it worked..
Posted: Mon Mar 06, 2006 11:49 pm
by djm
An alternative approach may be to preprocess the file using UNIX commands, deriving a commonly formatted record per line in the file. The target would be to have each row in the file with two columns; one is the "group by" and the other is the value. It will then not matter how many columns you have in the file.
Assuming that the field separator is a comma, an off-the-cuff solution would be something like:
Code: Select all
awk -F, '
{
gsub (/,/ , ( "\n" NR "," ) );
print NR "," $0;
}
' yourfile > newfile
This, when fed your sample data ...
Code: Select all
cat |
awk -F, '
{
gsub (/,/ , ( "\n" NR "," ) );
print NR "," $0;
}' << EOD
123,1020,1578
2344,5423,4353,2311
412
3213,4231
EOD
... yields ...
Code: Select all
1,123
1,1020
1,1578
2,2344
2,5423
2,4353
2,2311
3,412
4,3213
4,4231
... which I'm presuming is straight forward enough for you to deal with.
If you want to understand the awk script, the starting point is the UNIX command "man awk" (unless ray requests I expand further! :D ).
Hope this helps.
David
Posted: Tue Mar 07, 2006 7:58 am
by ray.wurlod
Search the forum to find a fast method for performing the pivot on the comma-delimited field using an intermediate sequential file. Follow this with a Transformer stage to do the lookup. Voila!
Posted: Tue Mar 07, 2006 7:36 pm
by narasimha
David, your script works like magic
But i need to lookup with the tables to get the keys and then populate the group id's.
Vaidya, your untested theory works, You get nice single coulmn of data from a row of delimited values.
But i guess my need is something like described below
Ray your approach looks the best, but need to do a reverse vertical pivot functionality. i.e look into single rows and form multiple rows.
I guess i need to build a routine to implement this logic, unless otherwise you have a better approach.
Detail Description of Job
Based on this table
Dim_Metro_Grp
(I get my Metro_Grp_keys from here
Code: Select all
Metro_Grp_Dsply Metro_Grp_key Metro_Grp_Num_Metro
-------------------------------------------------------------------------------------------------
1030,126,432 1 3
327,529 2 2
135 3 1
158,132,126 4 3
126,158,132,130 5 4
I have to populate this table
Metro_Grp_Dtl
(combination of the Metro_Grp_keys and Metro_Key
Code: Select all
Metro_Grp_Dtl_Key Metro_Grp_key Metro_Key
-------------------------------------------------------------------------------------------------
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
6 3 7
7 4 6
By looking into the Metro Keys from
Dim_Metro
Code: Select all
Metro_id Metro_Nm Metro_Key
-------------------------------------------------------------------------------------------------
1030 abc 1
126 asd 2
432 dfg 3
327 hjd 4
592 moi 5
158 oih 6
135 rfg 7
The end result populate the
Metro_Grp_Dtl Table