Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages

Talend Big Data Basics

Who Should Attend?


Duration: 2 Days
Training Date
  • 15 – 16 June 2020 (KL)
  • 8 – 9 October 2020 (KL)

To complete Talend Data Integration using Talend Studio to interact with Big Data Systems.

  • Create cluster metadata manually, from configuration files, or automatically
  • Create HDFS and Hive metadata
  • Connect to your cluster to use HDFS, HBase, Hive, Pig, Sqoop, and MapReduce
  • Read data from and write it to HDFS (HBase)
  • Read tables from and write them to HDFS ( Sqoop)
  • Process tables stored in HDFS with Hive
  • Process data stored in HDFS with Pig
  • Process data stored in HDFS with Big Data batch Jobs
  1. Big Data in Context
    • Concepts
  2. Basic Concepts
    • Opening a project
    • Monitoring the Hadoop cluster
    • Creating cluster metadata manually
    • Creating cluster metadata from
      Hadoop configuration files
    • Creating cluster metadata using a wizard
  3. Reading and Writing Data in HDFS
    • Storing a file in HDFS
    • Storing multiple files in HDFS
    • Reading data from HDFS
    • Storing sparse datasets with HBase
  4. Working with Tables
    • Importing tables with Sqoop
    • Creating tables with Hive
  5. Processing Data and Tables in HDFS
    • Processing Hive tables with Jobs
    • Profiling Hive tables (optional)
    • Processing data with Pig
    • Processing data with a Big Data
      batch Job
    • Migrating a standard Job to a batch Job
  6. Clickstream Use Case
    • Clickstream use case: resource
      management with YARN
    • Setting up a development
    • Loading data files onto HDFS
    • Enriching logs
    • Computing statistics
    • Understanding MapReduce Jobs
    • Using Talend Studio to configure a resource request to YARN
  7. HBase Reads and Writes
    • How HBase Writes Data
    • How HBase Reads Data
    • Block Caches for Reading
  8. HBase Performance Tuning
    • Column Family Considerations
    • Schema Design Considerations
    • Configuring for Caching
    • Dealing with Time Series and Sequential Data
    • Pre Splitting Regions
  9. HBase Administration and Cluster Management
    • HBase Daemons
    • ZooKeeper Considerations
    • HBase High Availability
    • Using the HBase Balancer
    • Fixing Tables with hbck
    • HBase Security
  10. HBase Replication and Backup
    • HBase Replication
    • HBase Backup
    • MapReduce and HBase Clusters
  11. Using Hive and Impala with HBase
    • Using Hive and Impala with HBase

Appendix A: Accessing Data with Python and Thrift

  • Thrift Usage
  • Working with Tables
  • Getting and Putting Data
  • Scanning Data
  • Deleting Data
  • Counters
  • Filters

Appendix B: OpenTSDB

Register Now

Drop us your entry if you are interested to join this course.


You may like

Using Pig, Hive, and Impala with Hadoop

Using Pig, Hive, and Impala with Hadoop

Using Pig, Hive, and Impala with HadoopDuration: 3 DaysThrough instructor-led discussion and interactive, hands-on exercises, participants will navigate the Hadoop ecosystem, learning topics such as: The features that Pig, Hive, and Impala offer for data acquisition,...

Talend Big Data Advanced – Spark

Talend Big Data Advanced – Spark

Talend Big Data Advanced - Spark BatchComplete TALEND BIG DATA BASICS Using TALEND STUDIO to interact with BIG DATA SYSTEMSDuration: 1 DayTraining Date 17 August 2020 (KL) To complete Talend Big Data Basics using Talend Studio to interact with Big Data Systems. Create...