Python for Big Data Analytics

Target Audience

DATA ANALYST who interested in using PYTHON

Duration: 5 Days

The Python for Big Data Analytics qualification is intended for an individual who aspires to stretch his analytics competencies in the organization using open-source technology to apply machine learning best practices to perform moderately sophisticated analytics processes, data prepping and modelling tasks for predictive analytics using machine learning with Python.

The Python for Big Data Analytics competence is critical to key staff who know the ins and outs of your unique business: strategy, customers, pain points, and tech stack; and work closely with the Data Scientist team and deeply involved in the requirements input, design, development, delivery and ultimate use of the predictive-based digital initiatives including data consumer, digital initiatives decision maker, business analyst, and operational line managers/staff.

The Python for Big Data Analytics examination is intended to assess whether a candidate understands and implement a predictive solution using machine learning with Python.

After the course, you should be able to:

  • Create data models for analytics function on multiple structured and unstructured data sources
  • Apply machine learning best practices for predictive analytics
  • Perform simple to moderate predictive analytics
  • Apply machine learning with scifit-learn and Python

Machine Learning

Module 1: Recall
▪ The goal of the machine learning (ML)
▪ Definition and classification of artificial intelligence (AI)
▪ Weak AI
▪ Strong AI
▪ AI vs ML

Module 2: Recall
▪ Supervised Learning / Unsupervised Learning
▪ Theory & application of ML
▪ Classification
▪ Regression
▪ Clustering
▪ Dimensionality reduction

Module 3: Recall
▪ Critical steps in ML – CRISP-DM
▪ Business Understanding

Module 4: Data Understanding
▪ Data Preparation
▪ Cleansing
▪ Missing value
▪ Formatting
▪ Standardization

Module 5: Feature Engineering
▪ Objectives
▪ Structured data
▪ Unstructured data
▪ Data normalization

Module 6: Extract Information from Unstructured Data
▪ Convert unstructured data to quantified metrics
▪ Best practices

Module 7
▪ Text
▪ Logs
▪ Image
▪ Time-series

Module 8: Training Data
▪ Preparation – training sets and evaluation sets

Module 9: Modelling
▪ Objectives
▪ Methods
▪ Parametric
▪ Non-parametric

Module 10: ML algorithms
▪ Logistic Regression
▪ Support vector machines (SVM)
▪ K-nearest neighbors (KNN)
▪ Random forest
▪ Naïve Bayes

Module 11: Evaluation
▪ Confusion matrix
▪ Accuracy, Recall & Precision
▪ ROC curve
▪ Root mean square deviation

Module 12: Deployment


Module 1: Recall
▪ Jupyter Notebook and Python for data analysis
▪ Python Data structure and programming
▪ Dataframes
▪ Series

Module 2
▪ Lists
▪ Strings

Module 3
▪ Tuples
▪ Dictionary
▪ Sets

Module 4
▪ Loops

Module 5
▪ Lambda

Module 6
▪ Python Libraries
▪ Scikit-learn
▪ Matplotlib

Module 7
▪ Pandas
▪ Data profiling
▪ Statistical computation

Module 8
▪ Data Cleansing & Munging
▪ Transformation
▪ Missing value handling

Module 9
▪ Dataframe
▪ Reading and writing data between in-memory data structures and different file formats
▪ Data structure column insertion and deletion
▪ Reshaping and pivoting of data sets

Module 10
▪ Label-based slicing, fancy indexing, and subsetting of large data sets
▪ Data set merging and joining
▪ Data filtration

  • CDPOS™ Citizen Data Scientist role certification holder
  •  Computer operation skills, for example, Windows, MAC, Linux
  • Basic Internet skills and any computer language skill
  • Basic SQL skill

Register Now

Drop us your entry if you are interested to join this course.

This field is for validation purposes and should be left unchanged.