Represent multiple records into one with conditions

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Represent multiple records into one with conditions

Post by gateleys »

I have a file which maintains data of StudentID and Grade(A,B,C,D or F).

Code: Select all

StudentID Grade
-----------  ------
111        B
222        A
222        C
111        B
111        A
Note: Duplicate rows are permitted.
Now, I need to transform the file into -

Code: Select all

StudentID Grade
-----------  ------
111        B B A
222        A C
Essentially, get one record for each student and concatenate all his grades in the same record.

I have tried to tackle the problem by sorting my data, then using stage variables to keep record of previous values. Use RowProcCompareWithPreviousVal() function, then pass it through Aggregator, GROUP BY StudentID and return the Last(Grade). But it seems to fail when the number of records are more than two. Please suggest.
kalpna
Premium Member
Premium Member
Posts: 78
Joined: Thu Feb 02, 2006 3:56 am

Post by kalpna »

u can do it these 2 ways..
First:
1) sort ur data on student id
2) take 4 stage variables in transformer in the order of
Curr = StudentID
temp = if Curr = Prev then 1 else 0
Prev = Curr
Grade = if temp = 1 then Grade:StudentID else StudentID
Assign this Grade value to Grade
3)Take an aggregator. group by StudentID take the Last Grade. Or u can directly write to hash file.

Second:
U can use Hash file with StudentID as key and take Grade as a multi valued column. then u can use Feild marks as seperators.

Cheers
kalpna
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Thanks Kalpana,
I did it using a much simpler way. Sorted input based on StudentID. I declared a stage variable 'stgPrevValue' whose derivation was-

Code: Select all

If(RowProcCompareWithPreviousValue(InLink.StudentID)) Then stgPrevValue:InLink.Grade Else InLink.Grade
Then voila!! All I needed to do next was to group by StudentID and get Last row for Grade.

Thanks.
gateleys
Thibal
Participant
Posts: 56
Joined: Tue Mar 30, 2004 3:43 am
Location: Nanterre, France

Post by Thibal »

You can also use the same Hashed files for reference and in target (I hope this sentence looks like english :?)

Image
-> Hash_ref and Hash correspond to the same Hashed File

In your transformer, when a line is looked-up : write.GRADE = look.GRADE : " " : read.GRADE
Otherwise : write.GRADE = read.GRADE

Image

At the start, the Hashed file must exist, use Validate to create an empty Hashed file. (In every case, you must empty your hashed file before writing).
sjhouse
Premium Member
Premium Member
Posts: 43
Joined: Tue Nov 02, 2004 12:11 pm
Location: Minneapolis, MN

Post by sjhouse »

I do this regularly using hash files and stage variables (sv).

sv1-NewRec; Derivation1-If STUDENT_ID = LastId Then 0 Else 1
sv2-Grade; Derivation2-If NewRec = 1 Then STUDENT_GRADE Else Grade : @VM : STUDENT_GRADE
sv3-LastId; Derivation3-STUDENT_ID

Write the results to a hash file (or other database file) with STUDENT_ID as the key field. The resulting record would be the following:


KeyFld - STUDENT; Derivation - STUDENT_ID
Field 1 - GRADE; Derivation - Grade

With STUDENT as the key field, each record will be overwritten until you get to the last record for that student. Also, you can use any delimiter you would like within the field. When the hash file is populated, you can read from it and write it to the desired location.

Stephen
acavarra
Premium Member
Premium Member
Posts: 10
Joined: Wed Sep 16, 2009 10:12 am

Post by acavarra »

Once I have the hashed file populated and read it as a reference lookup, how do I take a record out of the hash file like...

StudentID Grade
1 A@vmB@vmC
2 D@vmD

and populate a single row with multiple columns:

StudentID Grade1 Grade2 Grade3
1 A B C
2 D D
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

If you are using "value marks" between the fields you should (from what I recall) be able to "unnest" that field during the select of that data and automatically get all values in separate columns. Having it in a lookup may complicate that, I don't know off the top of my head and have no way to test it.

Otherwise, simply take the combined field from the lookup and use the Field() function to parse out the three values, one to each output column.
-craig

"You can never have too many knives" -- Logan Nine Fingers
acavarra
Premium Member
Premium Member
Posts: 10
Joined: Wed Sep 16, 2009 10:12 am

Post by acavarra »

I tried the FIELD function, but the @VM was not recognized as the delimiter so it did not parse the data.

The entire source column was written to the first target column.

Any ideas?

Thanks.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not sure why @VM would not work as the delimiter. Why not use a different delimiter then, something 'simple' like a pipe for instance?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply