multi record processing

mahi_mahi · Post by **mahi_mahi** » Mon Oct 17, 2005 1:49 pm

Hi All,

I am new to datastage please help for my requirement!!

My requirement is like this

Ex:
I have a input file say emp file where i have some 10 employeerecords
and i have another input file say manager file where i have 5 records

i have to process each record on 1 input file(emp) using a set of records on another file(manager). The set of records on the second file are not 'linked' directly to any value on the first file - i.e. you cannot use a 'lookup' key field.

so while process i should read emp file and for each record
i should read all the records from manager file..to process
my logic...

should be something like this

open emp file
do until emp-end-of-file
read emp record
open mgr-file
do unitl end-of-mgr-file or flag=tru

i will be setting flag...... based on my logic
end-do-for mgr-file
close mgr-file
check flag and write record
end-do-for-emp-file
close emp-file

please tell me how can i achieve this in datastage.....and link to manulas also help me.....please give an idea....

Mahi

kcbland · Post by **kcbland** » Mon Oct 17, 2005 2:50 pm

Welcome. Please look at your DS installation CD and run thru the tutorial exercises. Your example indicates you are trying to learn DS from the very beginning. The tutorial is the best method, as it has pictures and exercises that take you thru the fundamentals of building hash lookups, references, and other such constructs that your example requires. That's the fastest method for learning DS on your own.

ray.wurlod · Post by **ray.wurlod** » Mon Oct 17, 2005 4:30 pm

In DataStage you draw a picture of what you want to happen.

Code: Select all

               lookup
                 |
                 |
                 V
source --->  transformer  ---> target
                 |
                 |
                 V
               rejects

Each word represents a "stage". Then you edit each stage to fill in the details, for example which file is the source, what its format is, and so on.

Your lookup will probably need to be an ODBC stage or UV stage, as these are the only ones that support multi-row returns from a non-key lookup.

In particular, you do NOT create Open, ReadNext, ReadByKey, Write and Close methods; these are inherent to appropriately-selected stage types. You don't code the main loop either; DataStage looks after this for you.

rleishman · Post by **rleishman** » Mon Oct 17, 2005 6:45 pm

I wouldn't be so quick to dismiss this one, it's actually an interesting question.

What Mahi is trying to do is (in database terms) a CROSS JOIN, or cartesian join. There's a little more to it than that, since he wants the ability to filter out some of the result rows, but worst case scenario would be [# output rows] = [# rows file1] x [# rows file2].

Up front, I cannot think of a good way of doing this. Ray's UV stage is probably a goer, but I don't know how you could use it in the context of a lookup. Even if it returns multiple rows for a lookup, how does the transformer deal with them? Do you actually get a cross join? I thought that transformers had a pretty strict one-in-one-out policy.

None of what I have written above is any help to Mahi. What I'm getting at is that - whilst not an expert - I've done the course and written lots of jobs yet I don't know how I would go about a CROSS JOIN in DS without using a database. Perhaps that's the solution: load the two files into individual hash files, and then use a UV stage to CROSS JOIN the two hash files, treating them like database tables.

Code: Select all

Job #1

seq file (emp) ----> hash file


Job #2

seq file (mgr) ----> hash file


Job #3

UV Stage ----> Transformer ----> Target

kcbland · Post by **kcbland** » Mon Oct 17, 2005 7:44 pm

For a beginner who doesn't know DS it sure is an esoteric example using the proverbial emp table.

If the result set is to be:

1. All employees without a manager
2. All managers without an employee
3. An employee per manager, allowing for multiple managers per employee

If using tables to do this work, you're talking about a UNION ALL of one SQL statement using an outer join driving employee against manager on the cartesian product producing column and another SQL statement of managers without their cartesian product producing column value found in the employee table.

The DS solution requires the cartesian product. About the only way to do that is using the UV/ODBC stage. There's no trick I know of to produce this uniquely relational result.

mahi_mahi · Post by **mahi_mahi** » Tue Oct 18, 2005 6:25 am

Sorry All,
I have confused some of you by referring emp mgr files.

I have two files and there is no direct relation btw them but i need to
processing each record on 1-input file using a set(all) of records on another file 2-input file. The set of records on the second file are not 'linked' directly to any value on the first file - i.e. you cannot use a 'lookup' key field.

ray.wurlod · Post by **ray.wurlod** » Tue Oct 18, 2005 4:15 pm

So, what you're asking is for all possible combinations, also known as a Cartesian product.
This is achieved simply by naming both tables in your SELECT statement and not providing a join condition. Some databases may require you to enable the use of cross joins, because if table 1 contains 100 rows and table 2 contains 200 rows, the result set will contains 20000 rows.

DSXchange

multi record processing

multi record processing

Mutil record procerssing