Deriving different Levels

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Deriving different Levels

Post by Havoc »

Okay I have a scenario which i'm having some difficulty implementing. I have a table in the DBMS with the following structure:-

Parent Dept Child Dept
-----------------------
1 2
1 3
2 4
2 5
3 6
4 7
5 8
5 9

Now, i have to create a sequential file which has the following structure(lets call it Table B)

Dept Level
----------------------

Number , will be all the unique numbers (Parent and Child) and the Level is derived like this:-

Level 1 - 1
Level 2 - 2,3
Level 3 - 4,5,6
Level 4 - 7,8,9

So the Level column in Table B should hold the Level Number for the corresponding Num.

Can someone please tell me how I can accomplish this? I'm facing some difficulty trying to implement this and trying my best to avoid writing a procedure.

Thanks, for taking your time off to read this humongous question :)
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Sort the data, use Stage Variables and collect the data in there concatenating the pertinent columns, reset the stage variables on key column changes, pass to an Aggregator and use the LAST derivation for the concatentated data, group by the key columns.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

You will be able to find implented code here if you search, of exactly what Ken is saying.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Post by Havoc »

kcbland wrote:Sort the data, use Stage Variables and collect the data in there concatenating the pertinent columns, reset the stage variables on key column changes, pass to an Aggregator and use the LAST derivation for the concatentated data, group by the key columns.
Thanks for taking the time off to read the question Ken, but I dont think you've quite got the problem.

Basically, i have to load a table which has all the departments and based on the parent->child relationship between them .. I have to assign different levels. Let's say.. Parent Dept 1 has two child depts 2 and 3.

Then Dept 1 will be at Level 1
and
Dept 2 and Dept 3 will be at Level 2

If Dept2 and Dept3 have child depts then ALL those depts will belong to Level 3.

So the resultant table has two columns with department and level number.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

The sample input, is that your source. If it is then the output that you have shown is do-able as Ken suggested. It has been done time and again here. Search for it.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

A piece of friendly advice, represent your problem with actual rows and columns. Show exact source data and exact expected data. Your example shows a comma delimited string in the expected results. If this is not what you're asked, then show your example more clearly.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kenstorm
Premium Member
Premium Member
Posts: 40
Joined: Mon Oct 16, 2006 2:33 pm

Post by kenstorm »

This can be done using a routine and nested FOR NEXT loops.

Have a parent loop and and a child loop nested inside it. You could sort the data before looping it but it is not necessary. I did something like this that was more complex so I know it is do-able.

This can get you started.
POPULATE TWO ARRAYS:
* Use a transformer to create an array of the parent child data and pass it to the routine.
i.e. - 1 ,2|1, 3|2, 4|2 ,5|3 ,6|4, 7|5 ,8|5, 9
* Find your top parent. Assign it to a variable like varTopParent. You may be able to sort your data before passing it to the routine so the top parent is the first one in.
* Count your pairs. There are 8 pairs in your example.
* Dimension the array 2 columns wide and 8 rows deep.
* Put all the parent depts into a dimensioned array using a FOR NEXT loop.
Keep a unique parent count as it populates. Should be 5.
Use the Field function to read the input.
* Simultaneously put all the children into a dynamic array in the prementioned loop. Keep a unique child count as it populates. Should be 8.
FIND THOSE LEVELS
* Set up two FOR NEXT loops using the previous counts.
* For ChldIndex = 1 to ChildCnt
For PrntIndex = 1 to ParentCnt
Locate ChildArray in ParentArray
* Add level searching code here
End
Next
Next
* Format your output. Ans = ...

* Note: As far as I know, Dimensioned arrays don't have the Locate or find funtions that allow you to easily search the array "Perl" style that is why you use dymanic arrays.

1 ,2|1, 3|2, 4|2 ,5|3 ,6|4, 7|5 ,8|5, 9

Level 1 - 1
Level 2 - 2,3
Level 3 - 4,5,6
Level 4 - 7,8,9

Do you have any experience with routines?

The looping comments are incomplete. Sorry, I have to run. We can continue this later if you like.
Ken
Regards - Ken Breetz

Keep me in the circle of trust
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Nice try, but BASIC is not recommended for parallel jobs. Also, how well does the "solution" handle an arbitrary number of levels?

Yes, it CAN be done in BASIC, but that's not the point. A strategy needs to be found whereby an abitrary number of recursive queries (against the target table, as populated thus far) can be performed to find the ultimate path to level 1, and thus the appropriate level. This approach also assumes sorted input.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Sample Input and Expected Output

Post by Havoc »

kcbland wrote:A piece of friendly advice, represent your problem with actual rows and columns. Show exact source data and exact expected data. Your example shows a comma delimited string in the expected results. If this is not what you're asked, then show your example more clearly.
Okay, here's the sample input and expected sample output I created in two tables in Oracle.

INPUT

select * from Department;

PARENTDEPT CHILDDEPT
---------- ----------
1 2
1 3
2 4
2 5
3 6
3 7
4 8
4 9

8 rows selected.


EXPECTED OUTPUT


select * from DepartmentLevels;

DEPT DEPTLEVEL
---------- ----------
1 1
2 2
3 2
4 3
5 3
6 3
7 3
8 4
9 4

9 rows selected.

NOTE: There are spaces, I dont know, but when posting I think the spaces are getting trimmed. Each set of two numbers is one row in the table.

I hope this gives a clearer picture for the problem. I'm unable to follow Ken's solution though, can you please elaborate on it?

kenstorm, thanks for the routine logic :) but , I'm trying my best to avoid using it in the job.

I did try searching through the forum but couldn't find anything related to the problem in hand. Can somebody give me some suggestions on wat keywords i should use to improve my search?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

You're talking about creating a hierarchy and assigning a level. Because there's an indeterminate number of levels, you must be able to run an indeterminate number of passes thru the hierarchy until you don't have any non-placed data.

The easiest method is to decide if you have a fixed number of levels or unlimited. If you have a fixed number of levels, you can write a job design with a fixed number of flows, where each flow looks for a fixed level, exhausting data and putting the remnants into the input for the next flow, which does the exact same logic. Otherwise, you use one job with the flow and call the job repeatedly until you run out of data.

The alterntative is a custom program to do what you need. If you have Oracle, you could look at some of the capabilities it has for doing this very thing.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
kenstorm
Premium Member
Premium Member
Posts: 40
Joined: Mon Oct 16, 2006 2:33 pm

Post by kenstorm »

To Havoc: Yes routines can be "messy" but I have done this style of coding in a server job with unknown levels. If you run out of options, let me know, I can help you. Unfortunately I don't have time to code it. I am moving my home next week and starting a new job. Too much to do :(

To Ray: Why do the job in a PX? Even though Havoc posted this issue in the PX section, maybe he's just wants a solution? I Donno. For such a small data set, a PX job may be overkill. They start up slower than server jobs. Thanks for the advice on not using routines in PX jobs. I've gotten used to coding around the limitations of Datastage so routines are one of my first stops. I only presented an approach, not a "solution". My comments were just for starting. Also, the handling of arbitrary number of levels is do-able in a routine but I'm sure you know that. It just comes down to setting indexes from performing counts of the different parents and children departments and nesting loops. Yes, the pre-sort is imporant to start, that's a requirement. I hope this doesn't get misinterpreted, I am only trying to help.

To Ken Bland: I've seen your work at Disney and HSN. Wow! You really get around.

Ken B.

Sorry, no wise saying here :)
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

kenstorm wrote:To Ken Bland: I've seen your work at Disney and HSN. Wow! You really get around.
After 8 years consulting w/DataStage, you get around.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Havoc
Participant
Posts: 110
Joined: Fri Nov 24, 2006 8:26 am

Post by Havoc »

kcbland wrote:Sort the data, use Stage Variables and collect the data in there concatenating the pertinent columns, reset the stage variables on key column changes, pass to an Aggregator and use the LAST derivation for the concatentated data, group by the key columns.
Okay, I'm still trying to implement this suggestion from Ken but am finding difficulty to do so :( . Ken, can you please elaborate a little more? So u are suggesting a job flow which looks like

Oracle Stage -> Sort -> Transformer -> Aggregator -> Oracle stage ?

I didnt quite the get part abt concatenating the pertinent columns and resetting the stage variables.

To kenstorm: Ken, thanks a lot for your help. I am trying my level best to avoid using routines. But if nothing works out, I'll definitely give implementing your suggestion a shot.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

My first reply addressed how to achieve results according to your first example. After you stated a second example, I changed my answer to be inline with the new example.

You're going to have to write something that uses a recursive style of processing, because you're going to have to do repetitive passes thru the source data diminishing the source data until it's depleted and placed on the appropriate levels. You can do this with a job ran multiple times in a loop, or with a custom-written function or program. Ignore my first post.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Post by Kirtikumar »

If your total number of levels are fixed then try this:

First job create a lookup with ChildDept as key.

Second job read the source file and do lookup of ParDept from source with ChildDept of lookup and get the ParDept col from lookup. Write two columns to output file, first is ChildDept from source and ParentDept from lookup.
This file will give you ChildDept and its second level parent.

Run the second job again on and use output file generated from first run as input. The output generated from this will give you third level parent.

Run the second job n times and you will have files, each containing parent at different levels for the ChildDept. Join all the files based on ChildDept and get the parent for different level. The number of parents will give the level number of each record.

For implementing this, second job should have two job params - first as input filename and second as output filename.
During the first run, it should take original file as input and generated say file1 as output. During next run, use generated file1 as input and file2 as output file and so on...
Then join original file and all file1, file2...filen.
Regards,
S. Kirtikumar.
Post Reply