Talend Big Data Basics

Who Should Attend?

Complete TALEND DATA INTEGRATION using TALEND STUDIO to interact with BIG DATA SYSTEMS

}
Duration: 2 Days
Training Date
  • 15 – 16 June 2020 (KL)
  • 8 – 9 October 2020 (KL)

To complete Talend Data Integration using Talend Studio to interact with Big Data Systems.

  • Create cluster metadata manually, from configuration files, or automatically
  • Create HDFS and Hive metadata
  • Connect to your cluster to use HDFS, HBase, Hive, Pig, Sqoop, and MapReduce
  • Read data from and write it to HDFS (HBase)
  • Read tables from and write them to HDFS ( Sqoop)
  • Process tables stored in HDFS with Hive
  • Process data stored in HDFS with Pig
  • Process data stored in HDFS with Big Data batch Jobs
  1. Big Data in Context
    • Concepts
  2. Basic Concepts
    • Opening a project
    • Monitoring the Hadoop cluster
    • Creating cluster metadata manually
    • Creating cluster metadata from
      Hadoop configuration files
    • Creating cluster metadata using a wizard
  3. Reading and Writing Data in HDFS
    • Storing a file in HDFS
    • Storing multiple files in HDFS
    • Reading data from HDFS
    • Storing sparse datasets with HBase
  4. Working with Tables
    • Importing tables with Sqoop
    • Creating tables with Hive
  5. Processing Data and Tables in HDFS
    • Processing Hive tables with Jobs
    • Profiling Hive tables (optional)
    • Processing data with Pig
    • Processing data with a Big Data
      batch Job
    • Migrating a standard Job to a batch Job
  6. Clickstream Use Case
    • Clickstream use case: resource
      management with YARN
    • Setting up a development
      environment
    • Loading data files onto HDFS
    • Enriching logs
    • Computing statistics
    • Understanding MapReduce Jobs
    • Using Talend Studio to configure a resource request to YARN
  7. HBase Reads and Writes
    • How HBase Writes Data
    • How HBase Reads Data
    • Block Caches for Reading
  8. HBase Performance Tuning
    • Column Family Considerations
    • Schema Design Considerations
    • Configuring for Caching
    • Dealing with Time Series and Sequential Data
    • Pre Splitting Regions
  9. HBase Administration and Cluster Management
    • HBase Daemons
    • ZooKeeper Considerations
    • HBase High Availability
    • Using the HBase Balancer
    • Fixing Tables with hbck
    • HBase Security
  10. HBase Replication and Backup
    • HBase Replication
    • HBase Backup
    • MapReduce and HBase Clusters
  11. Using Hive and Impala with HBase
    • Using Hive and Impala with HBase

Appendix A: Accessing Data with Python and Thrift

  • Thrift Usage
  • Working with Tables
  • Getting and Putting Data
  • Scanning Data
  • Deleting Data
  • Counters
  • Filters

Appendix B: OpenTSDB

Register Now

Drop us your entry if you are interested to join this course.