multi record processing

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mahi_mahi
Charter Member
Charter Member
Posts: 45
Joined: Mon Aug 01, 2005 10:02 am

multi record processing

Post by mahi_mahi »

Hi All,

I am new to datastage please help for my requirement!!

My requirement is like this

Ex:
I have a input file say emp file where i have some 10 employeerecords
and i have another input file say manager file where i have 5 records

i have to process each record on 1 input file(emp) using a set of records on another file(manager). The set of records on the second file are not 'linked' directly to any value on the first file - i.e. you cannot use a 'lookup' key field.

so while process i should read emp file and for each record
i should read all the records from manager file..to process
my logic...

should be something like this

open emp file
do until emp-end-of-file
read emp record
open mgr-file
do unitl end-of-mgr-file or flag=tru

i will be setting flag...... based on my logic
end-do-for mgr-file
close mgr-file
check flag and write record
end-do-for-emp-file
close emp-file

please tell me how can i achieve this in datastage.....and link to manulas also help me.....please give an idea....

Mahi
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Welcome. Please look at your DS installation CD and run thru the tutorial exercises. Your example indicates you are trying to learn DS from the very beginning. The tutorial is the best method, as it has pictures and exercises that take you thru the fundamentals of building hash lookups, references, and other such constructs that your example requires. That's the fastest method for learning DS on your own.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In DataStage you draw a picture of what you want to happen.

Code: Select all

               lookup
                 |
                 |
                 V
source --->  transformer  ---> target
                 |
                 |
                 V
               rejects
Each word represents a "stage". Then you edit each stage to fill in the details, for example which file is the source, what its format is, and so on.

Your lookup will probably need to be an ODBC stage or UV stage, as these are the only ones that support multi-row returns from a non-key lookup.

In particular, you do NOT create Open, ReadNext, ReadByKey, Write and Close methods; these are inherent to appropriately-selected stage types. You don't code the main loop either; DataStage looks after this for you.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

I wouldn't be so quick to dismiss this one, it's actually an interesting question.

What Mahi is trying to do is (in database terms) a CROSS JOIN, or cartesian join. There's a little more to it than that, since he wants the ability to filter out some of the result rows, but worst case scenario would be [# output rows] = [# rows file1] x [# rows file2].

Up front, I cannot think of a good way of doing this. Ray's UV stage is probably a goer, but I don't know how you could use it in the context of a lookup. Even if it returns multiple rows for a lookup, how does the transformer deal with them? Do you actually get a cross join? I thought that transformers had a pretty strict one-in-one-out policy.

None of what I have written above is any help to Mahi. What I'm getting at is that - whilst not an expert - I've done the course and written lots of jobs yet I don't know how I would go about a CROSS JOIN in DS without using a database. Perhaps that's the solution: load the two files into individual hash files, and then use a UV stage to CROSS JOIN the two hash files, treating them like database tables.

Code: Select all

Job #1

seq file (emp) ----> hash file


Job #2

seq file (mgr) ----> hash file


Job #3

UV Stage ----> Transformer ----> Target
Ross Leishman
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

For a beginner who doesn't know DS it sure is an esoteric example using the proverbial emp table. :(


If the result set is to be:

1. All employees without a manager
2. All managers without an employee
3. An employee per manager, allowing for multiple managers per employee

If using tables to do this work, you're talking about a UNION ALL of one SQL statement using an outer join driving employee against manager on the cartesian product producing column and another SQL statement of managers without their cartesian product producing column value found in the employee table.

The DS solution requires the cartesian product. About the only way to do that is using the UV/ODBC stage. There's no trick I know of to produce this uniquely relational result.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
mahi_mahi
Charter Member
Charter Member
Posts: 45
Joined: Mon Aug 01, 2005 10:02 am

Mutil record procerssing

Post by mahi_mahi »

Sorry All,
I have confused some of you by referring emp mgr files.

I have two files and there is no direct relation btw them but i need to
processing each record on 1-input file using a set(all) of records on another file 2-input file. The set of records on the second file are not 'linked' directly to any value on the first file - i.e. you cannot use a 'lookup' key field.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

So, what you're asking is for all possible combinations, also known as a Cartesian product.
This is achieved simply by naming both tables in your SELECT statement and not providing a join condition. Some databases may require you to enable the use of cross joins, because if table 1 contains 100 rows and table 2 contains 200 rows, the result set will contains 20000 rows.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply