MongoDB aggregation comparison: group(), $group and MapReduce

It is somewhat confusing since the names are similar, but the group() command is a different feature and implementation from the $group pipeline operator in the Aggregation Framework. The group() command, Aggregation Framework, and MapReduce are collectively aggregation features of MongoDB. There is some overlap in features, but I’ll attempt to explain the differences and … Read more

What are SUCCESS and part-r-00000 files in hadoop

See http://www.cloudera.com/blog/2010/08/what%E2%80%99s-new-in-apache-hadoop-0-21/ On the successful completion of a job, the MapReduce runtime creates a _SUCCESS file in the output directory. This may be useful for applications that need to see if a result set is complete just by inspecting HDFS. (MAPREDUCE-947) This would typically be used by job scheduling systems (such as OOZIE), to denote … Read more

Explode the Array of Struct in Hive

You need to explode only once (in conjunction with LATERAL VIEW). After exploding you can use a new column (called prod_and_ts in my example) which will be of struct type. Then, you can resolve the product_id and timestamps members of this new struct column to retrieve the desired result. SELECT user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps … Read more