How to traverse thru a hierarchy of records

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
muascdev
Charter Member
Charter Member
Posts: 51
Joined: Tue Oct 10, 2006 5:48 pm

How to traverse thru a hierarchy of records

Post by muascdev »

My project involves reading a record which is parent and each parent record has a set of children records. its basically hierarchy. when I read each children, I need to check previoussly read children records and check for certain rules, and finally process the record. there will be a mximum of 10 children records for each root node. is it possible to do this in datastage without using a stored procedure. can this be written as BASIC routine or is there way of doing this. appreciate all the help. :cry:
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard. :D

Yes, it can be written in DataStage BASIC.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
muascdev
Charter Member
Charter Member
Posts: 51
Joined: Tue Oct 10, 2006 5:48 pm

Post by muascdev »

ray.wurlod wrote:Welcome aboard. :D

Yes, it can be written in DataStage BASIC.

how do I write though .. I am a beginner .. can you guide me pl
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Why not sort the child records by parent, then process the child records and reference the parent record for each child, use stage variables to accumulate intermediate results, and send the final result out to an aggregator stage using the LAST derivation for all columns except the key columns? You basically treat every child row as if it was the last row for a parent, that way the aggregator will only output the final row.

I don't think you have a hierachy if you're describing a parent row with up to 10 children. A hierarchy is a grand parent who has children, who in turn are parents to their children, and those children have children, etc. You have generations to process.

Please clarify, as the solution I gave you here is appropriate for non-hierarchies. Hierarchies require recursive processing, descending thru the generations. This either involves fixed passes within a complicated single job design, or, job control that runs the same job over and over until it has processed the data thru.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

... or a recursive routine using BCI functions. It's Windows, so there are not ODBC licensing issues. It can be done, I've done it: a scheduler in which all the dependencies were in an Oracle table, nested up to seven levels deep (but potentially more).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
muascdev
Charter Member
Charter Member
Posts: 51
Joined: Tue Oct 10, 2006 5:48 pm

Post by muascdev »

kcbland wrote:Why not sort the child records by parent, then process the child records and reference the parent record for each child, use stage variables to accumulate intermediate results, and send the final result out to an aggregator stage using the LAST derivation for all columns except the key columns? You basically treat every child row as if it was the last row for a parent, that way the aggregator will only output the final row.

I don't think you have a hierachy if you're describing a parent row with up to 10 children. A hierarchy is a grand parent who has children, who in turn are parents to their children, and those children have children, etc. You have generations to process.

Please clarify, as the solution I gave you here is appropriate for non-hierarchies. Hierarchies require recursive processing, descending thru the generations. This either involves fixed passes within a complicated single job design, or, job control that runs the same job over and over until it has processed the data thru.
yes its a hierarchy , when i said 10 children for a parent .. I meant parent is the root node, and for each subsequent children, previous children is a parent. sorry for the confusion.
muascdev
Charter Member
Charter Member
Posts: 51
Joined: Tue Oct 10, 2006 5:48 pm

Post by muascdev »

ray.wurlod wrote:... or a recursive routine using BCI functions. It's Windows, so there are not ODBC licensing issues. It can be done, I've done it: a scheduler in which all the dependencies were in an Oracle table, nested up to seven levels deep (but potentially more).
what is a BCI function.. can you give me an example pl
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The easiest method is going to be a DS Function that opens and reads a hashed file. You would want to stage all of the data into a hashed file. Then, read your parent rows and for each row use a Function to derive your required results. That Function can lookup a child row in the reference hashed file, perform whatever logic is necessary, and if required call itself to continue descending the generations in the hierarchy.

Or, in an easier way, since you have a maximum number of generations, you could do it in a loop internal to the function and read thru the generations creating an array of the data for the generations. Something like:

SEQ-->XFM/filter to ultimate parents/generation function -->SEQ

For the generation function, the pseudo-code could be something like:

Code: Select all

DIM GENERATIONS(10)
NoChildren = @FALSE
NumberOfGenerations = 0
For PTR=1 TO 10 Until NoChildren
   Read parent from hashed file using ChildKey then
      GENERATIONS(PTR)=parent
      ChildKey=parent<xx>  ;*  xx is the field where the child key is located
      NumberOfGenerations += 1
   End Else
      NoChildren=@TRUE
   END   
Next PTR
Now you have each generation in order in a static array GENERATIONS in their chronological order. You can do whatever logic you need in the function to derive your answers. Since the rows are now records, you can traverse using simple logic.

One note on this type of static array of records. Each entry in the array is a dynamic array unto itself. So, if attribute 3 is of interest in each record, you would use notation like GENERATION(x)<3>, where x is the particular generation you're looking at.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

BCI functions mimic the ODBC API. Search the forum for SQLConnect or SQLFetch (or even BCI).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply