hadoop.mapred vs hadoop.mapreduce?

They are separated out because both of these packages represent 2 different APIs. org.apache.hadoop.mapred is the older API and org.apache.hadoop.mapreduce is the new one. And it was done to allow programmers write MapReduce jobs in a more convenient, easier and sophisticated fashion. You might find this presentation useful, which talks about the differences in detail. … Read more

Is it better to use the mapred or the mapreduce package to create a Hadoop Job?

Functionality wise there is not much difference between the old (o.a.h.mapred) and the new (o.a.h.mapreduce) API. The only significant difference is that records are pushed to the mapper/reducer in the old API. While the new API supports both pull/push mechanism. You can get more information about the pull mechanism here. Also, the old API has … Read more

Simple Java Map/Reduce framework [closed]

Have you check out Akka? While akka is really a distributed Actor model based concurrency framework, you can implement a lot of things simply with little code. It’s just so easy to divide work into pieces with it, and it automatically takes full advantage of a multi-core machine, as well as being able to use … Read more

Is gzip format supported in Spark?

From the Spark Scala Programming guide’s section on “Hadoop Datasets”: Spark can create distributed datasets from any file stored in the Hadoop distributed file system (HDFS) or other storage systems supported by Hadoop (including your local file system, Amazon S3, Hypertable, HBase, etc). Spark supports text files, SequenceFiles, and any other Hadoop InputFormat. Support for … Read more

Where does hadoop mapreduce framework send my System.out.print() statements ? (stdout)

Actually stdout only shows the System.out.println() of the non-map reduce classes. The System.out.println() for map and reduce phases can be seen in the logs. Easy way to access the logs is http://localhost:50030/jobtracker.jsp->click on the completed job->click on map or reduce task->click on tasknumber->task logs->stdout logs. Hope this helps

Is Mongodb Aggregation framework faster than map/reduce?

Every test I have personally run (including using your own data) shows aggregation framework being a multiple faster than map reduce, and usually being an order of magnitude faster. Just taking 1/10th of the data you posted (but rather than clearing OS cache, warming the cache first – because I want to measure performance of … Read more