Separately Group Rows then Re-group them in Exact Order

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
KFajardo
Participant
Posts: 17
Joined: Mon May 11, 2015 7:22 pm
Location: Philippines

Separately Group Rows then Re-group them in Exact Order

Post by KFajardo »

Hi guys, I am new to Datastage and I encountered a problem that's maybe a simple one.

I have a flat file source that looks like the one below:

Type 1|XXX|XXX|1234
Type 2|X|XXXXXXXXX|2015-02-01
Type 3|XXXX|XXXXXXX|2015-02-01
Type 4|XXXXXXX|XX
Type 6|X|XXX|X|XX
Type 1|XXX|XXX|567
Type 2|X|XXXXXXXXX|2015-02-01
Type 3|XXXX|XXXXXXX|2015-02-01
Type 4|XXXXXXX|XX
Type 6|X|XXX|X|XX
Type 1|XXX|XXX|1234
Type 2|X|XXXXXXXXX|2015-02-01
Type 3|XXXX|XXXXXXX|2015-02-01
Type 4|XXXXXXX|XX
Type 6|X|XXX|X|XX
Type 1|XXX|XXX|8911
Type 2|X|XXXXXXXXX|2015-02-01
Type 3|XXXX|XXXXXXX|2015-02-01
Type 4|XXXXXXX|XX
Type 6|X|XXX|X|XX

* Type 1~6 Rows is refers to 1 instance/project.
BUT the only traceability of the project is only located at Type 1's
* So my problem is, if I separate all rows, I will not know from which project does each OTHER TYPEs belong to.

*I want to perform a transformation on Type 1 that's why I want to separate them. But there will be a bigger problem when doing so. That's why I'm thinking if there's a way that I can separate each project and group them so that the OTHER TYPES wo'nt get lost.

Hoping for the help.

Thank you,
-Kit
ShaneMuir
Premium Member
Premium Member
Posts: 508
Joined: Tue Jun 15, 2004 5:00 am
Location: London

Post by ShaneMuir »

Welcome.

If the data is as simple as below - you could use a stage variable to geneate a surrogate key value each time a new Type1 is received and then apply that to each other type until such time as the next Type1 is received. That surrogate key could be a number, or the "project" name itself.

So read the flat file in sequentially (to ensure that all the project information ends up on the same node) and then in a transformer stage set a stage variable to identify when the incoming record is a type1.
When that field is a type 1 set your surrogate key, then apply that surrogate key as a new output column to each input row. Each time a new type1 arrives update your surrogate key value.

Hope this helps.
KFajardo
Participant
Posts: 17
Joined: Mon May 11, 2015 7:22 pm
Location: Philippines

Post by KFajardo »

Thank you, Shanemuir.

I think you answered the question. But I don't have an idea on how to implement it. Can you shed some light on giving an example or a formula so that I can visualize on how the stage variable will work. Because I only know simple stage variables for now and I don't know to implement a stage variable that will toke effect until another TYPE1 will occur and then get a substring of the next Project name.

Again, thank you. You are really helping me on this.

-keith
KFajardo
Participant
Posts: 17
Joined: Mon May 11, 2015 7:22 pm
Location: Philippines

Post by KFajardo »

Will i use a Loop here? I have searched the net on my problem and looping seems to be a possible solution. Can you help me on this?

Hoping for your generous help.

-keith
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I don't think there's any need for a loop. Create the stage variable as, say, type Integer with an initial value of 0. Then, each time the value "Type 1" occurs, increment it.

Code: Select all

svGroupNumber  <--  If InLink.Col1 = "Type 1" Then svGroupNumber + 1 Else svGroupNumber
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
KFajardo
Participant
Posts: 17
Joined: Mon May 11, 2015 7:22 pm
Location: Philippines

Post by KFajardo »

but i need the project# (that is in each type 1)
to be iterated for each row.
example:

input
Type 1|xx|project01
Type 2|xxxxx
Type 3|xxxxxx
Type 1|xx|project02
Type 2|xxxxx
Type 3|xxxxxx


it must be:
project01|Type 1|xx|project01
project01|Type 2|xxxxx
project01|Type 3|xxxxxx
project02|Type 1|xx|project02
project02|Type 2|xxxxx
project02|Type 3|xxxxxx
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Code: Select all

svProject <--  If InLink.Col1 = "Type 1" Then InLink.Project Else svProject
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
KFajardo
Participant
Posts: 17
Joined: Mon May 11, 2015 7:22 pm
Location: Philippines

Post by KFajardo »

This is only a single column singe it is from a flat file. I want to populate the row with the "project" that is located on every type1-line only. So, i want to check on what project does every Other Types (type2, type3, etc...) belongs to.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes? Use the value of the svProject stage variable, which contains the project from the most recently-encountered Type 1 record.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
KFajardo
Participant
Posts: 17
Joined: Mon May 11, 2015 7:22 pm
Location: Philippines

Post by KFajardo »

Ah! Now I get it! svProect's value is set into a new value everytime the it encounters a "Type1"! Thank you so much ray.wurlod!
I can now distinguish every project separately.

With all my heart, thanks again!

Keith
Post Reply