Hadoop Overview

Who Should Attend?

Anyone involve in BIG DATA and DATABASE

Duration: 5 Days

This course assists participants to become Hadoop Spark expert by learning core Big Data technologies and gain hands-on knowledge of Hadoop and Spark along with their ecosystem components like HDFS, Map-Reduce, Sqoop, core Spark, Spark RDDs, Apache Spark SQL, and Spark Streaming. This course will be beneficial for the professionals below:

  • Software developers, project managers and architects
  • BI, ETL and Data Warehousing Professionals
  • Mainframe and testing professionals
  • Business analysts and analytics professionals
  • DBAs and DB professionals
  • Professionals willing to learn data science techniques
  • Any graduate focusing to build career in Big Data
  • Understand the fundamental concept of Apache Hadoop
  • Identify the core components in Apache Hadoop
  • Manage a Hadoop cluster and its components
  • Install and configure the components of Hadoop cluster
  • Understand the concept of Hadoop security
  1. The Case for Apache Hadoop ▪ Why Hadoop?
    • Fundamental Concepts
    • Core Hadoop Components
  2. Hadoop Cluster Installation
    • Rationale for a Cluster Management Solution
    • Cloudera Manager Features
    • Cloudera Manager Installation
    • Hadoop (CDH) Installation
  3. The Hadoop Distributed File System (HDFS)
    • HDFS Features
    • Writing and Reading Files
    • NameNode Memory Considerations
    • Overview of HDFS Security
    • Web UIs for HDFS
    • Using the Hadoop File Shell
  4. MapReduce and Spark on YARN
    • The Role of Computational Frameworks
    • YARN: The Cluster Resource Manager
    • MapReduce Concepts
    • Apache Spark Concepts
    • Running Computational Frameworks on YARN
    • Exploring YARN Applications Through the Web UIs, and the Shell
    • YARN Application Logs
  5. Hadoop Configuration and Daemon Logs
    • Cloudera Manager Constructs for Managing Configurations
    • Locating Configurations and Applying Configuration Changes
    • Managing Role Instances and Adding Services
    • Configuring the HDFS Service
    • Configuring Hadoop Daemon Logs
    • Configuring the YARN Service
  6. Getting Data Into HDFS
    • Ingesting Data from External Sources with Flume
    • Ingesting Data from Relational Databases with Sqoop
    • REST Interfaces
    • Introduction to Kafka & Use Cases
    • Best Practices for Importing Data
  7. Planning Your Hadoop Cluster
    • General Planning Considerations
    • Choosing the Right Hardware
    • Virtualization Options
    • Network Considerations
    • Configuring Nodes
  8. Installing and Configuring Hive, Impala & Pig
    • Hive
    • Impala
    • Pig
  9. Hadoop Clients Including Hue
    • What are Hadoop Clients?
    • Installing and Configuring Hadoop Clients
    • Installing and Configuring Hue
    • Hue Authentication and Authorization
  10. Hadoop Security
    • Why Hadoop Security is Important
    • Hadoop’s Security System Concepts
    • What Kerberos is and How it Works
    • Securing a Hadoop Cluster with Kerberos
    • Other Security Concepts
  11. Managing Resources
    • Configuring cgroups with Static Service Pools
    • The Fair Scheduler
    • Configuring Dynamic Resource Pools
    • YARN Memory and CPU Settings
    • Impala Query Scheduling
  12. Introduction to Apache Hbase
    • HBase Overview & Use Cases

Register Now

Drop us your entry if you are interested to join this course.