I have data from disparate sources (sequential files and database tables) and need to get frequency counts based on the keys.
Very simplified example:
Code: Select all
seq_file --> aggregator --> output_link_1
db_table --> select count(*) from db_table group by key1, key2 --> output_link_2
etc. ... 5 more data sources, mix of file types
output_link_1: Key1, Key2, Count1
output_link_2: Key1, Key2, Count2
Desired output is:
Key1, Key2, Count1, Count2
Is there a way to merge the rows in memory, without having to write to one or more intermediate seq files, hashed files, or db tables, then coding the merge "manually"?
Thanks,
Scott