Hadoop | Getting started

uploads/2019/06/hadoop-logo-elephant-1.png
HDFS Hadoop’s File Share which can be local or shared depending on your setup
MapReduce Hadoop’s Aggregation/Synchronization tool enabling highly parallel processing…this is the true “engine” or time saver in Hadoop
Hive Hadoop’s SQL query window, equivalent to Microsoft Query Analyzer
Pig Dataflow scripting tool similar to a Batch job or simplistic ETL processer
Flume Collector/Facilitator of Log file information
Ambari Web-based Admin tool utilized for managing, provisioning, and monitoring Hadoop Cluster
Cassandra High-Availability, Scalable, Multi-Master database platform…RDBMS on sterioids
Mahout Machine Learning engine, which translates into, it does complex calculations, algorithmic processing, and statistical/stochastic operations using R and other frameworks…it does serious math!
Spark Programmatic based compute engine allowing for ETL, machine learning, stream processing, and graph computation
ZooKeeper Coordinator service for all your distributed processing
Oozie Workflow scheduler managing Hadoop jobs

Apache

https://sentry.apache.org/

https://de.hortonworks.com/apache/ranger/

https://mahout.apache.org/

https://pig.apache.org/

https://zookeeper.apache.org/

https://oozie.apache.org/

Diverses

http://ercoppa.github.io/HadoopInternals/

The Latest