Hadoop | Getting started
Modules
HDFS | Hadoop’s File Share which can be local or shared depending on your setup |
MapReduce | Hadoop’s Aggregation/Synchronization tool enabling highly parallel processing…this is the true “engine” or time saver in Hadoop |
Hive | Hadoop’s SQL query window, equivalent to Microsoft Query Analyzer |
Pig | Dataflow scripting tool similar to a Batch job or simplistic ETL processer |
Flume | Collector/Facilitator of Log file information |
Ambari | Web-based Admin tool utilized for managing, provisioning, and monitoring Hadoop Cluster |
Cassandra | High-Availability, Scalable, Multi-Master database platform…RDBMS on sterioids |
Mahout | Machine Learning engine, which translates into, it does complex calculations, algorithmic processing, and statistical/stochastic operations using R and other frameworks…it does serious math! |
Spark | Programmatic based compute engine allowing for ETL, machine learning, stream processing, and graph computation |
ZooKeeper | Coordinator service for all your distributed processing |
Oozie | Workflow scheduler managing Hadoop jobs |
Links
Apache
https://de.hortonworks.com/apache/ranger/