Demos On 18th / 19th Nov
PIMPLE SAUDAGAR :    Angular JS (10:00 AM)   |   Spring / Hibernate (10:00 AM)   |    Java (10:00 AM)   |   Selenium (11:00 AM)   |   AWS - Amazon Web Services (11:00 AM)   |   DevOps (3:00 PM)   |   BigData & Hadoop (5:00 PM)   |   SAP MM (6:00 PM)   |   Unix (5:00 PM)   |   BigData & Hadoop(19th Nov - sunday) (05:00 PM)      DECCAN :    AWS - Amazon Web Services (11:00 AM)   |   Selenium (11:30 AM)   |   DevOps (12:00 PM)   |   DevOps(19th Nov - sunday) (02:00 PM)      
ISO 9001:2015 Certified Organization  |  Pearson Exam Center

Hadoop 2.X BigData Analytics

Overview

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. A Hadoop frame-worked application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage.

Duration : (50-60 hours)

Prerequisites : No specific programming background needed.

Training Highlights : Trainer is having total 12 years of experience and actual 3 years experience in Hadoop.This training gives student hands on experience on Hadoop technology and leads him to a successful career in Hadoop Administration Job , Development or Testing.

Course Syllabus

Hadoop 2.X BigData Analytics
  1. Java
    • Overview of Java
    • Classes and Objects
    • Classes and Objects
    • Inheritance, Aggregation, Polymorphism
    • Command line argument
    • Abstract class and Interfaces
    • String Handling
    • Exception Handling, Multithreading
    • Serialization and Advanced Topics
    • Collection Framework, GUI, JDBC
  2. Linux
    • Unix History & Over View
    • Command line file-system browsing
    • Bash/CORN Shell
    • Users Groups and Permissions
    • VI Editor
    • Introduction to Process
    • Basic Networking
    • Shell Scripting live scenarios
  3. SQL
    • Introduction to SQL, Data Definition Language (DDL)
    • Data Manipulation Language(DML)
    • Operator and Sub Query
    • Various Clauses, SQL Key Words
    • Joins, Stored Procedures, Constraints, Triggers
    • Cursors /Loops / IF Else / Try Catch, Index
    • Data Manipulation Language (Advanced)
    • Constraints, Triggers,
    • Views, Index Advanced
  1. Introduction to BigData
    • Introduction and relevance
    • Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and Insurance etc.
    • Problems with Traditional Large-Scale Systems
  2. Hadoop (Big Data) Ecosystem
    • Motivation for Hadoop
    • Different types of projects by Apache
    • Role of projects in the Hadoop Ecosystem
    • Key technology foundations required for Big Data
    • Limitations and Solutions of existing Data Analytics Architecture
    • Comparison of traditional data management systems with Big Data management systems
    • Evaluate key framework requirements for Big Data analytics
    • Hadoop Ecosystem & Hadoop 2.x core components
    • Explain the relevance of real-time data
    • Explain how to use big and real-time data as a Business planning tool
  3. Building Blocks
    • Quick tour of Java (As Hadoop is Written in Java , so it will help us to understand it better)
    • Quick tour of Linux commands ( Basic Commands to traverse the Linux OS)
    • Quick Tour of RDBMS Concepts (to use HIVE and Impala)
    • Quick hands on experience of SQL.
    • Introduction to Cloudera VM and usage instructions
  4. Hadoop Cluster Architecture – Configuration Files
    • Hadoop Master-Slave Architecture
    • The Hadoop Distributed File System – data storage
    • Explain different types of cluster setups (Fully distributed/Pseudo etc.)
    • Hadoop Cluster set up – Installation
    • Hadoop 2.x Cluster Architecture
    • A Typical enterprise cluster – Hadoop Cluster Modes
  5. Hadoop Core Components – HDFS & Map Reduce (YARN)
  6. HDFS Overview & Data storage in HDFS
    • Get the data into Hadoop from local machine (Data Loading Techniques) – vice versa
    • MapReduce Overview (Traditional way Vs. MapReduce way)
    • Concept of Mapper & Reducer
    • Understanding MapReduce program skeleton
    • Running MapReduce job in Command line/Eclipse
    • Develop MapReduce Program in JAVA
    • Develop MapReduce Program with the streaming API
    • Test and debug a MapReduce program in the design time
    • How Partitioners and Reducers Work Together
    • Writing Customer Partitioners Data Input and Output
    • Creating Custom Writable and Writable Comparable Implementations
  7. Data Integration Using Sqoop and Flume
    • Integrating Hadoop into an existing Enterprise
    • Loading Data from an RDBMS into HDFS by Using Sqoop
    • Managing Real-Time Data Using Flume
    • Accessing HDFS from Legacy Systems with FuseDFS and HttpFS
    • Introduction to Talend (community system)
    • Data loading to HDFS using Talend
  8. Data Analysis using PIG
    • Introduction to Hadoop Data Analysis Tools
    • Introduction to PIG – MapReduce Vs Pig, Pig Use Cases
    • Pig Latin Program & Execution
    • Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF
    • Use Pig to automate the design and implementation of MapReduce applications
    • Data Analysis using PIG
  9. Data Analysis using HIVE
    • Introduction to Hive – Hive Vs. PIG – Hive Use Cases
    • Discuss the Hive data storage principle
    • Explain the File formats and Records formats supported by the Hive environment
    • Perform operations with data in Hive
    • Hive QL: Joining Tables, Dynamic Partitioning, Custom MapReduce Scripts
    • Hive Script, Hive UDF
  10. Data Analysis Using Impala
    • Introduction to Impala & Architecture
    • How Impala executes Queries and its importance
    • Hive vs. PIG vs. Impala
    • Extending Impala with User Defined functions
    • Improving Impala performance
  11. NoSQL Database – Hbase
    • Introduction to NoSQL Databases and Hbase
    • HBase v/s RDBMS, HBase Components, HBase Architecture
    • HBase Cluster Deployment
  12. Hadoop – Other Analytics Tools
    • Introduction to role of R in Hadoop Eco-system
    • Introduction to Jasper Reports & creating reports by integrating with Hadoop
    • Role of Kafka & Avro in real projects
  13. Other Apache Projects
    • Data Model, Zookeeper Service
    • Introduction to Oozie – Analyze workflow design and management using Oozie
    • Design and implement an Oozie Workflow
    • Introduction to Storm
    • Introduction to Spark
  14. Spark
    • What is Apache Spark?
    • Using the Spark Shell
    • RDDs (Resilient Distributed Datasets)
    • Functional Programming in Spark
    • Working with RDDs in Spark
    • A Closer Look at RDDs
    • Key-Value Pair RDDs
    • MapReduce
    • Other Pair RDD Operations
  15. Final project
    • Real World Use Case Scenarios
    • Understand the implementation of Hadoop in Real World and its benefits.
    • Final project including integration various key components
    • Follow-up session: Tips and tricks for projects, certification and interviews etc

Inquire Now



Student’s Reviews about 3RI