Help!

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ysweec
Participant
Posts: 31
Joined: Thu Jun 17, 2004 9:01 pm

Help!

Post by ysweec »

Hi All,

Your help on this is very much appreciated!

Question:
How can a flat file with 256 columns to retrieve values from another column in a table for each column ?

e.g.
File A:-
basic layer_number r_lpg max
---------------------------------------------------------
AIX 1 709 9
PIG 4 89 1
TVG 3 101 7

Table B:-
parameter avail
----------------------------------
basic Y
layer_number Y
r_lpg N
max Y

The column names in file A are values in parameter in table B.
How to retrieve avail in table B for each column in file A?

Can DS handle this kind of tasks?
Can routine solve this?

I have tried to use the following statement to look up 2nd field in a hash file:-
TRANS([DICT]filename, record.ID, field#, control.code)
yet the record.ID is unknown to me
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

A couple of questions. I am assuming (from the TRANS function) that your lookup table is actually a hashed file.

First is, can you post your code delimited by Code tags? (actually, I will do that here).

Code: Select all

File A:- 
basic layer_number r_lpg max 
--------------------------------------------------------- 
AIX    1       709       9 
PIG    4        89       1 
TVG    3       101       7 

Table B:- 
parameter avail 
---------------------------------- 
basic              Y 
layer_number       Y 
r_lpg              N 
max                Y 
Second is, on what basis do you want to perform this lookup? On the column name? This is a really tricky thing to be able to do in DataStage, though not impossible. However, you will need to hard-code the column names ("basic", "layer_number" and so on) as either reference key expressions or the second argument to the TRANS function).

Please clarify your requirement.

It is often a useful exercise to describe, in natural language, what you want to happen to data; it make the process of translating into an ETL design so much easier, as it has acted as a mind-focussing exercise.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ysweec
Participant
Posts: 31
Joined: Thu Jun 17, 2004 9:01 pm

Post by ysweec »

Hi ray,

Thanks for your response! :D
Actually i was requested to come out a design to compare data in a file(contains 256 columns)with a reference table. Yet, the file and table have no reference key to link up each other(refer to File A & Table B posted formerly)
The column names in file A are data in table B's column(avail).
By getting the avail('Y', 'N') for each column in file A, then only can proceed to next step for data comparison.
I wonder instead of writing 256 jobs to lookup for avail in Table B, can a routine(to be used in transformer)solve this problem?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can perform all the lookups in one job.

The reference key expression is a constant in each case, for example "basic", "layer_number" and so on.

You need a separate lookup for each input column - don't try to do them all in one Transfomer stage; you aren't allowed that many links in any case!

For each lookup, the constant is the column name in Table B, if I read your design correctly.

For efficiency, if your business rules permit, you might be able to abandon processing of a row as soon as "N" is encountered. Achieve this with output constraints in Transformer stages, and don't do too many lookups in any one Transformer stage.

However...

It seems to me that, in your design, you will return
  • "Y" for any value in "TableA"."basic" (that is, in every row)
    "Y" for any value in "TableA"."layer_number"
    "N" for any value in "TableA"."r_lpg"
    "Y" for any value in "TableA"."max"
    and so on
Why, then, would you bother with a lookup, unless the contents of TableB change over time? Have I misinterpreted your requirements? Or have you?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ysweec
Participant
Posts: 31
Joined: Thu Jun 17, 2004 9:01 pm

Post by ysweec »

Yes..u got it right!
The contents of TableB change over everyday.
e.g, basic's avail might be 'Y' today but 'N' tomorrow.

Yet, there are 200 over columns to lookup for the column in Table B. In this case, how to minimize the usage of transformer stage?
Is it feasible to use routine in this case?
Thanks!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The job design would look something like a long string of Transformer stages, each doing N lookups. For example, you might have 64 Transformer stages each doing 4 lookups. I think that would be a nice balance for 256 columns (your original specification). Each Transformer stage adds four new Char(1) columns (containing "Y" or "N") to the stream of data being processed.

The problem with using a Routine for an "all at once" approach is that your design could not really separate the columns; a Routine is limited to fewer than 256 arguments, so you would have to read enture lines from the file, pass the line to the routine, parse it and perform the reads of the hashed file there, then pass back the original line plus the additional "Y" or "N" characters, before parsing it out into individual columns in a following Transformer stage. So, yes, you could do it, but you're avoiding DataStage's graphical paradigm.

You didn't reveal the business rules for "further processing". Some of the nice things about the multiple Transformer approach are:
  • you can abandon processing a row as soon as a "N" is returned (if that's appropriate)

    you can perform other transformation, such as handling NULL

    you can implement pipeline parallelism, using row buffering or explicit IPC stages
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ysweec
Participant
Posts: 31
Joined: Thu Jun 17, 2004 9:01 pm

Post by ysweec »

Hi Ray,

Thanks for your advice.
Question:-
1. I found a built-in routine called UtilityHashLookup. Do you think it will serve the purpose better?
2. Is it feasible to use DS to convert columns to rows. For instance, from 256 columns to 256 rows?


*ACTUAL TASK*
Purpose:
-----------
To check the discrepancy of input file with standard parameter in reference table 2

Code: Select all

Input File:- 
=======
LRD		bsic	layer_number	pwrc
-------------------------------------------------
MMKK	0	10		9
JKKK  	1	2		4
ANNN		3	6		8

Reference Table 1:-
=============
LRD		Region
-----------------------
MMKK	  South
JKKK		East

Reference Table 2:-
=============
Parameter	           Def_South   Def_East   Max_South   Max_East   VA
--------------------------------------------------------------------------------
bsic	           0 to 2          0 to 4       1                 1               Y
layer_number       0,2,4           0,2,4        12               12             N
pwrc                    0 to 255       0 to 255   255              255           Y


Output 1:- 
LRD	bsic  bsic_d            layer_number  layer_number_d 
-------------------------------------------------------------------------
MMKK  	0      Not Macth        10                 Not Macth             
JKKK  	1      Match              2                   Match
ANNN        3      Not Macth        6                   Not Macth           
 
Actual Output Format Required:- 
Only LRD with columns not match
LRD         ColumnName
----------------------------- 
MMKK      bsic
MMKK      layer_number
JJKK        bsic
JJKK        layer_number
Logic:
------
1. Get the Region from reference table 1 for input file using LRD as reference key.
2. Get the VA from reference table 2 for each column in input file
3. Determine to retrieve values from Max_<Region> or Default_<Region> for each column
if VA = "Y", following lookup -> Max_<Region>
if VA = "N", following lookup -> Default_<Region>
4. Then, compare the values in input file with the values from Reference Table 2

For instance:-
=========
column :- bsic
Reference Table 2: VA = 'Y', so next lookup is Max_<Region>

1st row:-
LRD = MMKK, bsic = 0
Reference Table 1: Region = 'South',
-> with that next checking column is Max_South with value = 1
Result: 0 & 1 is not match
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

UtilityHashLookup uses a hashed file on disk. It cannot take advantage of memory caching in the same way that the Hashed File stage can. So your job would be far slower. I have not had time for detailed analysis of your new design yet.

Converting columns to rows (which is a horizontal pivot) is performed by the Pivot stage. Read the Pivot Stage manual (pivot.pdf) which is in the Packages folder on your DataStage CD. It's very easy to use.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ysweec
Participant
Posts: 31
Joined: Thu Jun 17, 2004 9:01 pm

Post by ysweec »

Ray,

Thanks!
The Pivot stage can only be found in DS v7.0 and above, right?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Pivot stage was introduced in version 4.2, if my memory serves me aright. It's certainly in 5.2, so definitely in 6.x which you're using. You may need to install it using the DataStage Package Installer if it was not initially installed.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply