Page 1 of 1

Represent multiple records into one with conditions

Posted: Tue Feb 07, 2006 9:16 am
by gateleys
I have a file which maintains data of StudentID and Grade(A,B,C,D or F).

Code: Select all

StudentID Grade
-----------  ------
111        B
222        A
222        C
111        B
111        A
Note: Duplicate rows are permitted.
Now, I need to transform the file into -

Code: Select all

StudentID Grade
-----------  ------
111        B B A
222        A C
Essentially, get one record for each student and concatenate all his grades in the same record.

I have tried to tackle the problem by sorting my data, then using stage variables to keep record of previous values. Use RowProcCompareWithPreviousVal() function, then pass it through Aggregator, GROUP BY StudentID and return the Last(Grade). But it seems to fail when the number of records are more than two. Please suggest.

Posted: Tue Feb 07, 2006 10:07 am
by kalpna
u can do it these 2 ways..
First:
1) sort ur data on student id
2) take 4 stage variables in transformer in the order of
Curr = StudentID
temp = if Curr = Prev then 1 else 0
Prev = Curr
Grade = if temp = 1 then Grade:StudentID else StudentID
Assign this Grade value to Grade
3)Take an aggregator. group by StudentID take the Last Grade. Or u can directly write to hash file.

Second:
U can use Hash file with StudentID as key and take Grade as a multi valued column. then u can use Feild marks as seperators.

Cheers
kalpna

Posted: Tue Feb 07, 2006 10:21 am
by gateleys
Thanks Kalpana,
I did it using a much simpler way. Sorted input based on StudentID. I declared a stage variable 'stgPrevValue' whose derivation was-

Code: Select all

If(RowProcCompareWithPreviousValue(InLink.StudentID)) Then stgPrevValue:InLink.Grade Else InLink.Grade
Then voila!! All I needed to do next was to group by StudentID and get Last row for Grade.

Thanks.
gateleys

Posted: Tue Feb 07, 2006 10:22 am
by Thibal
You can also use the same Hashed files for reference and in target (I hope this sentence looks like english :?)

Image
-> Hash_ref and Hash correspond to the same Hashed File

In your transformer, when a line is looked-up : write.GRADE = look.GRADE : " " : read.GRADE
Otherwise : write.GRADE = read.GRADE

Image

At the start, the Hashed file must exist, use Validate to create an empty Hashed file. (In every case, you must empty your hashed file before writing).

Posted: Tue Feb 07, 2006 2:28 pm
by sjhouse
I do this regularly using hash files and stage variables (sv).

sv1-NewRec; Derivation1-If STUDENT_ID = LastId Then 0 Else 1
sv2-Grade; Derivation2-If NewRec = 1 Then STUDENT_GRADE Else Grade : @VM : STUDENT_GRADE
sv3-LastId; Derivation3-STUDENT_ID

Write the results to a hash file (or other database file) with STUDENT_ID as the key field. The resulting record would be the following:


KeyFld - STUDENT; Derivation - STUDENT_ID
Field 1 - GRADE; Derivation - Grade

With STUDENT as the key field, each record will be overwritten until you get to the last record for that student. Also, you can use any delimiter you would like within the field. When the hash file is populated, you can read from it and write it to the desired location.

Stephen

Posted: Fri Aug 20, 2010 7:24 am
by acavarra
Once I have the hashed file populated and read it as a reference lookup, how do I take a record out of the hash file like...

StudentID Grade
1 A@vmB@vmC
2 D@vmD

and populate a single row with multiple columns:

StudentID Grade1 Grade2 Grade3
1 A B C
2 D D

Posted: Fri Aug 20, 2010 7:29 am
by chulett
If you are using "value marks" between the fields you should (from what I recall) be able to "unnest" that field during the select of that data and automatically get all values in separate columns. Having it in a lookup may complicate that, I don't know off the top of my head and have no way to test it.

Otherwise, simply take the combined field from the lookup and use the Field() function to parse out the three values, one to each output column.

Posted: Fri Aug 20, 2010 8:45 am
by acavarra
I tried the FIELD function, but the @VM was not recognized as the delimiter so it did not parse the data.

The entire source column was written to the first target column.

Any ideas?

Thanks.

Posted: Fri Aug 20, 2010 8:50 am
by chulett
Not sure why @VM would not work as the delimiter. Why not use a different delimiter then, something 'simple' like a pipe for instance?