Introduction to Hadoop and Radoop

Duration: 3 Days
Training Date
  • 20 – 21 July 2020 (Bangkok)
  • 2 – 3 November 2020 (KL)
  • Understand and explore Hadoop infrastructure and ecosystem
  • Explore Hadoop core component HDFS and YARN
  • Use relational data stores with Hadoop
  • Understand large scale data processing frameworks
  • Connect to Hadoop cluster using RapidMiner Radoop
  • Integrate task and analysis into RapidMiner processes

Part A: Introduction to Hadoop

  1. Introduction to Big Data Hadoop
    • Hadoop Overview and History
    • Exploring Hadoop Ecosystem
    • Introduction to Cloudera Manager and CDH
  2. Introduction to Hadoop Core Components: HDFS and YARN
    • HDFS Distributed File System
      • Understanding HDFS Architecture and Components
      • Hands on HDFS basics shell commands
    • YARN Resource Management
      • Understanding YARN Architecture and Components
    • Exploring Hadoop file formats (Which file format is better)
  3. Using relational data stores with Hadoop
    • Introduction to Hive (Hive Architecture and how it works)
    • Overview of Hive supported file formats and Hive partition
    • Hands on HiveQL through Hue interface & Hive Client
    • Integrating MySQL with Hive using Sqoop
    • Use Sqoop to import data from MySQL to HDFS/Hive
    • Use Sqoop to export data from Hadoop to MySQL
  4. Large Scale Data Processing Frameworks
    • Introduction to Hadoop MapReduce
      • What is it & How it works?
      • Practical MapReduce example
    • Introduction to Apache Spark
      • Overview of Spark Core Concepts and Architecture
        • Spark Clusters and the Resource Management System
        • Spark Application
        • Spark Driver and Executor
      • Introducing to Spark data structures: The Resilient Distributed Dataset (RDD) & RDD Operations
      • Hands On simple Spark jobs

Part B: Introduction to Radoop

  1. Introduction to Radoop
    • Hadoop Integration with RapidMiner: Radoop
    • Introduction to the Radoop GUI
    • Connecting to a Hadoop Cluster
  2. Data Exploration
    • Browsing Tables
    • Viewing Statistics and High-Level Information
  3. Data Extraction and Loading
    • Formulation of Queries
    • Pushing Data into Hadoop
    • Clustering
  4. Integration of In cluster Analyses into
    RapidMiner Processes
    • Modeling Algorithms
    • Natural Aggregation
    • In memory Training, in Hadoop Scoring
  5. Beyond Natural Aggregation
    • Chunking
    • Voting
    • In Hadoop Modelling
    • Clustering

Register Now

Drop us your entry if you are interested to join this course.