Help!
Moderators: chulett, rschirm, roy
Help!
Hi All,
Your help on this is very much appreciated!
Question:
How can a flat file with 256 columns to retrieve values from another column in a table for each column ?
e.g.
File A:-
basic layer_number r_lpg max
---------------------------------------------------------
AIX 1 709 9
PIG 4 89 1
TVG 3 101 7
Table B:-
parameter avail
----------------------------------
basic Y
layer_number Y
r_lpg N
max Y
The column names in file A are values in parameter in table B.
How to retrieve avail in table B for each column in file A?
Can DS handle this kind of tasks?
Can routine solve this?
I have tried to use the following statement to look up 2nd field in a hash file:-
TRANS([DICT]filename, record.ID, field#, control.code)
yet the record.ID is unknown to me
Your help on this is very much appreciated!
Question:
How can a flat file with 256 columns to retrieve values from another column in a table for each column ?
e.g.
File A:-
basic layer_number r_lpg max
---------------------------------------------------------
AIX 1 709 9
PIG 4 89 1
TVG 3 101 7
Table B:-
parameter avail
----------------------------------
basic Y
layer_number Y
r_lpg N
max Y
The column names in file A are values in parameter in table B.
How to retrieve avail in table B for each column in file A?
Can DS handle this kind of tasks?
Can routine solve this?
I have tried to use the following statement to look up 2nd field in a hash file:-
TRANS([DICT]filename, record.ID, field#, control.code)
yet the record.ID is unknown to me
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Welcome aboard! :D
A couple of questions. I am assuming (from the TRANS function) that your lookup table is actually a hashed file.
First is, can you post your code delimited by Code tags? (actually, I will do that here).
Second is, on what basis do you want to perform this lookup? On the column name? This is a really tricky thing to be able to do in DataStage, though not impossible. However, you will need to hard-code the column names ("basic", "layer_number" and so on) as either reference key expressions or the second argument to the TRANS function).
Please clarify your requirement.
It is often a useful exercise to describe, in natural language, what you want to happen to data; it make the process of translating into an ETL design so much easier, as it has acted as a mind-focussing exercise.
A couple of questions. I am assuming (from the TRANS function) that your lookup table is actually a hashed file.
First is, can you post your code delimited by Code tags? (actually, I will do that here).
Code: Select all
File A:-
basic layer_number r_lpg max
---------------------------------------------------------
AIX 1 709 9
PIG 4 89 1
TVG 3 101 7
Table B:-
parameter avail
----------------------------------
basic Y
layer_number Y
r_lpg N
max Y
Please clarify your requirement.
It is often a useful exercise to describe, in natural language, what you want to happen to data; it make the process of translating into an ETL design so much easier, as it has acted as a mind-focussing exercise.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi ray,
Thanks for your response! :D
Actually i was requested to come out a design to compare data in a file(contains 256 columns)with a reference table. Yet, the file and table have no reference key to link up each other(refer to File A & Table B posted formerly)
The column names in file A are data in table B's column(avail).
By getting the avail('Y', 'N') for each column in file A, then only can proceed to next step for data comparison.
I wonder instead of writing 256 jobs to lookup for avail in Table B, can a routine(to be used in transformer)solve this problem?
Thanks for your response! :D
Actually i was requested to come out a design to compare data in a file(contains 256 columns)with a reference table. Yet, the file and table have no reference key to link up each other(refer to File A & Table B posted formerly)
The column names in file A are data in table B's column(avail).
By getting the avail('Y', 'N') for each column in file A, then only can proceed to next step for data comparison.
I wonder instead of writing 256 jobs to lookup for avail in Table B, can a routine(to be used in transformer)solve this problem?
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You can perform all the lookups in one job.
The reference key expression is a constant in each case, for example "basic", "layer_number" and so on.
You need a separate lookup for each input column - don't try to do them all in one Transfomer stage; you aren't allowed that many links in any case!
For each lookup, the constant is the column name in Table B, if I read your design correctly.
For efficiency, if your business rules permit, you might be able to abandon processing of a row as soon as "N" is encountered. Achieve this with output constraints in Transformer stages, and don't do too many lookups in any one Transformer stage.
However...
It seems to me that, in your design, you will return
The reference key expression is a constant in each case, for example "basic", "layer_number" and so on.
You need a separate lookup for each input column - don't try to do them all in one Transfomer stage; you aren't allowed that many links in any case!
For each lookup, the constant is the column name in Table B, if I read your design correctly.
For efficiency, if your business rules permit, you might be able to abandon processing of a row as soon as "N" is encountered. Achieve this with output constraints in Transformer stages, and don't do too many lookups in any one Transformer stage.
However...
It seems to me that, in your design, you will return
- "Y" for any value in "TableA"."basic" (that is, in every row)
"Y" for any value in "TableA"."layer_number"
"N" for any value in "TableA"."r_lpg"
"Y" for any value in "TableA"."max"
and so on
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Yes..u got it right!
The contents of TableB change over everyday.
e.g, basic's avail might be 'Y' today but 'N' tomorrow.
Yet, there are 200 over columns to lookup for the column in Table B. In this case, how to minimize the usage of transformer stage?
Is it feasible to use routine in this case?
Thanks!
The contents of TableB change over everyday.
e.g, basic's avail might be 'Y' today but 'N' tomorrow.
Yet, there are 200 over columns to lookup for the column in Table B. In this case, how to minimize the usage of transformer stage?
Is it feasible to use routine in this case?
Thanks!
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
The job design would look something like a long string of Transformer stages, each doing N lookups. For example, you might have 64 Transformer stages each doing 4 lookups. I think that would be a nice balance for 256 columns (your original specification). Each Transformer stage adds four new Char(1) columns (containing "Y" or "N") to the stream of data being processed.
The problem with using a Routine for an "all at once" approach is that your design could not really separate the columns; a Routine is limited to fewer than 256 arguments, so you would have to read enture lines from the file, pass the line to the routine, parse it and perform the reads of the hashed file there, then pass back the original line plus the additional "Y" or "N" characters, before parsing it out into individual columns in a following Transformer stage. So, yes, you could do it, but you're avoiding DataStage's graphical paradigm.
You didn't reveal the business rules for "further processing". Some of the nice things about the multiple Transformer approach are:
The problem with using a Routine for an "all at once" approach is that your design could not really separate the columns; a Routine is limited to fewer than 256 arguments, so you would have to read enture lines from the file, pass the line to the routine, parse it and perform the reads of the hashed file there, then pass back the original line plus the additional "Y" or "N" characters, before parsing it out into individual columns in a following Transformer stage. So, yes, you could do it, but you're avoiding DataStage's graphical paradigm.
You didn't reveal the business rules for "further processing". Some of the nice things about the multiple Transformer approach are:
- you can abandon processing a row as soon as a "N" is returned (if that's appropriate)
you can perform other transformation, such as handling NULL
you can implement pipeline parallelism, using row buffering or explicit IPC stages
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Hi Ray,
Thanks for your advice.
Question:-
1. I found a built-in routine called UtilityHashLookup. Do you think it will serve the purpose better?
2. Is it feasible to use DS to convert columns to rows. For instance, from 256 columns to 256 rows?
*ACTUAL TASK*
Purpose:
-----------
To check the discrepancy of input file with standard parameter in reference table 2
Logic:
------
1. Get the Region from reference table 1 for input file using LRD as reference key.
2. Get the VA from reference table 2 for each column in input file
3. Determine to retrieve values from Max_<Region> or Default_<Region> for each column
if VA = "Y", following lookup -> Max_<Region>
if VA = "N", following lookup -> Default_<Region>
4. Then, compare the values in input file with the values from Reference Table 2
For instance:-
=========
column :- bsic
Reference Table 2: VA = 'Y', so next lookup is Max_<Region>
1st row:-
LRD = MMKK, bsic = 0
Reference Table 1: Region = 'South',
-> with that next checking column is Max_South with value = 1
Result: 0 & 1 is not match
Thanks for your advice.
Question:-
1. I found a built-in routine called UtilityHashLookup. Do you think it will serve the purpose better?
2. Is it feasible to use DS to convert columns to rows. For instance, from 256 columns to 256 rows?
*ACTUAL TASK*
Purpose:
-----------
To check the discrepancy of input file with standard parameter in reference table 2
Code: Select all
Input File:-
=======
LRD bsic layer_number pwrc
-------------------------------------------------
MMKK 0 10 9
JKKK 1 2 4
ANNN 3 6 8
Reference Table 1:-
=============
LRD Region
-----------------------
MMKK South
JKKK East
Reference Table 2:-
=============
Parameter Def_South Def_East Max_South Max_East VA
--------------------------------------------------------------------------------
bsic 0 to 2 0 to 4 1 1 Y
layer_number 0,2,4 0,2,4 12 12 N
pwrc 0 to 255 0 to 255 255 255 Y
Output 1:-
LRD bsic bsic_d layer_number layer_number_d
-------------------------------------------------------------------------
MMKK 0 Not Macth 10 Not Macth
JKKK 1 Match 2 Match
ANNN 3 Not Macth 6 Not Macth
Actual Output Format Required:-
Only LRD with columns not match
LRD ColumnName
-----------------------------
MMKK bsic
MMKK layer_number
JJKK bsic
JJKK layer_number
------
1. Get the Region from reference table 1 for input file using LRD as reference key.
2. Get the VA from reference table 2 for each column in input file
3. Determine to retrieve values from Max_<Region> or Default_<Region> for each column
if VA = "Y", following lookup -> Max_<Region>
if VA = "N", following lookup -> Default_<Region>
4. Then, compare the values in input file with the values from Reference Table 2
For instance:-
=========
column :- bsic
Reference Table 2: VA = 'Y', so next lookup is Max_<Region>
1st row:-
LRD = MMKK, bsic = 0
Reference Table 1: Region = 'South',
-> with that next checking column is Max_South with value = 1
Result: 0 & 1 is not match
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
UtilityHashLookup uses a hashed file on disk. It cannot take advantage of memory caching in the same way that the Hashed File stage can. So your job would be far slower. I have not had time for detailed analysis of your new design yet.
Converting columns to rows (which is a horizontal pivot) is performed by the Pivot stage. Read the Pivot Stage manual (pivot.pdf) which is in the Packages folder on your DataStage CD. It's very easy to use.
Converting columns to rows (which is a horizontal pivot) is performed by the Pivot stage. Read the Pivot Stage manual (pivot.pdf) which is in the Packages folder on your DataStage CD. It's very easy to use.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Pivot stage was introduced in version 4.2, if my memory serves me aright. It's certainly in 5.2, so definitely in 6.x which you're using. You may need to install it using the DataStage Package Installer if it was not initially installed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.