Text, Web, and Social
Media Mining with RapidMiner

Who Should Attend?


Duration: 3 Days

This course is an introduction into knowledge discovery using unstructured data like text documents, web and social media contents. It focuses on the necessary preprocessing steps and the most successful methods for automatic text classification including: Naive Bayes, Support Vector Machines (SVM), and clustering. Upon completion of this course,
participants will have a solid understanding of typical text mining workflows and be able to identify techniques for processing unstructured data, apply different statistical text processing methods, and perform content classification and clustering.

  • Identify techniques for processing unstructured data
  • Transform textual data into a structured format
  • Apply different statistical text processing methods
  • Perform text classification and text clustering
  • Work on popular tasks like sentiment analysis or opinion mining
  1. Overview
    • Business Scenario
    • Analytics Taxonomy & Hierarchy
    • CRISP DM & Data Mining in the Enterprise
  2. Basic Usage
    • User Interface
    • Creating and Managing RapidMiner Repositories
    • Operators and Processes
    • Storing Data, Processes, and Results Sets
  3. EDA: Exploratory Data Analysis
    • Loading Data
    • Quick Summary Statistics
  4. Data Preparation
    • Basic Data ETL (Extract, Transform & Load)
  5. Predictive Model’s Algorithms
    • K Nearest Neighbour
  6. Model Construction and Evaluation
    • Machine Learning Theory: Bias, Variance, Overfitting & Underfitting
    • Split and Cross Validation
    • Applying Models
    • Evaluation Methods & Performance Criteria
  7. Loading of Texts
    • Loading from Flat Files
    • Loading from Data Sets
    • Loading from Web Sources (e.g. URL Crawling, Twitter)
  8. Concepts
    • Text Processing
    • Documents
    • Tokens
  9. Visualization
    • Visualizing Documents and Tokens
    • Multi Dimensional Visualizations
  10. Handling Unstructured Data
    • Preprocessing of Textual Data
    • Tokenizing
    • Stemming
    • Filtering of Tokens
    • Term Frequencies
    • Document Frequencies
    • TF IDF
  11. Advanced Modeling
    • Support Vector Machines
    • Naïve Bayes
    • Text Clustering
  12. Web Mining
    • Crawling the Web
    • Extracting Information from Web Sites
    • Transforming Web Sites to
    • Retrieving Structured Web Data
    • Data ETL and Pre processing for Web Sourced Data
    • Enriching Data via Web Services
    • Using Third Party Web Mining

Register Now

Drop us your entry if you are interested to join this course.