Hadoop | Getting started

Modules

HDFSHadoop’s File Share which can be local or shared depending on your setup
MapReduceHadoop’s Aggregation/Synchronization tool enabling highly parallel processing…this is the true “engine” or time saver in Hadoop
HiveHadoop’s SQL query window, equivalent to Microsoft Query Analyzer
PigDataflow scripting tool similar to a Batch job or simplistic ETL processer
FlumeCollector/Facilitator of Log file information
AmbariWeb-based Admin tool utilized for managing, provisioning, and monitoring Hadoop Cluster
CassandraHigh-Availability, Scalable, Multi-Master database platform…RDBMS on sterioids
MahoutMachine Learning engine, which translates into, it does complex calculations, algorithmic processing, and statistical/stochastic operations using R and other frameworks…it does serious math!
SparkProgrammatic based compute engine allowing for ETL, machine learning, stream processing, and graph computation
ZooKeeperCoordinator service for all your distributed processing
OozieWorkflow scheduler managing Hadoop jobs

Links

Apache

https://sentry.apache.org/

https://de.hortonworks.com/apache/ranger/

https://mahout.apache.org/

https://pig.apache.org/

https://zookeeper.apache.org/

https://oozie.apache.org/

Diverses

http://ercoppa.github.io/HadoopInternals/