Hadoop is an Apache open source framework written in Java that allows distributed processing of large data sets across clusters of computers using simple programming models. A Hadoop frame-worked application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from a single server to thousands of machines, each offering local computation and storage.
Duration : (50-60 hours) Prerequisites : No specific programming background needed. Training Highlights : Trainer is having total 12 years of experience and actual 3 years’ experience in Hadoop. This training gives student hands-on experience on Hadoop technology and leads him to a successful career in Hadoop Administration Job, Development or Testing. Course Syllabus Hadoop 2.X BigData Analytics Java Overview of Java Classes and Objects Classes and Objects Inheritance, Aggregation, Polymorphism Command line argument Abstract class and Interfaces String Handling Exception Handling, Multithreading Serialization and Advanced Topics Collection Framework, GUI, JDBC Linux Unix History & Over View Command line file-system browsing Bash/CORN Shell Users Groups and Permissions VI Editor Introduction to Process Basic Networking Shell Scripting live scenarios SQL Introduction to SQL, Data Definition Language (DDL) Data Manipulation Language(DML) Operator and Sub Query Various Clauses, SQL Key Words Joins, Stored Procedures, Constraints, Triggers Cursors /Loops / IF Else / Try Catch, Index Data Manipulation Language (Advanced) Constraints, Triggers, Views, Index Advanced Introduction to BigData Introduction and relevance Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and Insurance etc. Problems with Traditional Large-Scale Systems Hadoop (Big Data) Ecosystem Motivation for Hadoop Different types of projects by Apache Role of projects in the Hadoop Ecosystem Key technology foundations required for Big Data Limitations and Solutions of existing Data Analytics Architecture Comparison of traditional data management systems with Big Data management systems Evaluate key framework requirements for Big Data analytics Hadoop Ecosystem & Hadoop 2.x core components Explain the relevance of real-time data Explain how to use big and real-time data as a Business planning tool Building Blocks Quick tour of Java (As Hadoop is Written in Java , so it will help us to understand it better) Quick tour of Linux commands ( Basic Commands to traverse the Linux OS) Quick Tour of RDBMS Concepts (to use HIVE and Impala) Quick hands on experience of SQL. Introduction to Cloudera VM and usage instructions Hadoop Cluster Architecture – Configuration Files Hadoop Master-Slave Architecture The Hadoop Distributed File System – data storage Explain different types of cluster setups (Fully distributed/Pseudo etc.) Hadoop Cluster set up – Installation Hadoop 2.x Cluster Architecture A Typical enterprise cluster – Hadoop Cluster Modes Hadoop Core Components – HDFS & Map Reduce (YARN) HDFS Overview & Data storage in HDFS Get the data into Hadoop from local machine (Data Loading Techniques) – vice versa MapReduce Overview (Traditional way Vs. MapReduce way) Concept of Mapper & Reducer Understanding MapReduce program skeleton Running MapReduce job in Command line/Eclipse Develop MapReduce Program in JAVA Develop MapReduce Program with the streaming API Test and debug a MapReduce program in the design time How Partitioners and Reducers Work Together Writing Customer Partitioners Data Input and Output Creating Custom Writable and Writable Comparable Implementations Data Integration Using Sqoop and Flume Integrating Hadoop into an existing Enterprise Loading Data from an RDBMS into HDFS by Using Sqoop Managing Real-Time Data Using Flume Accessing HDFS from Legacy Systems with FuseDFS and HttpFS Introduction to Talend (community system) Data loading to HDFS using Talend Data Analysis using PIG Introduction to Hadoop Data Analysis Tools Introduction to PIG – MapReduce Vs Pig, Pig Use Cases Pig Latin Program & Execution Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF Use Pig to automate the design and implementation of MapReduce applications Data Analysis using PIG Data Analysis using HIVE Introduction to Hive – Hive Vs. PIG – Hive Use Cases Discuss the Hive data storage principle Explain the File formats and Records formats supported by the Hive environment Perform operations with data in Hive Hive QL: Joining Tables, Dynamic Partitioning, Custom MapReduce Scripts Hive Script, Hive UDF Data Analysis Using Impala Introduction to Impala & Architecture How Impala executes Queries and its importance Hive vs. PIG vs. Impala Extending Impala with User Defined functions Improving Impala performance NoSQL Database – Hbase Introduction to NoSQL Databases and Hbase HBase v/s RDBMS, HBase Components, HBase Architecture HBase Cluster Deployment Hadoop – Other Analytics Tools Introduction to role of R in Hadoop Eco-system Introduction to Jasper Reports & creating reports by integrating with Hadoop Role of Kafka & Avro in real projects Other Apache Projects Data Model, Zookeeper Service Introduction to Oozie – Analyze workflow design and management using Oozie Design and implement an Oozie Workflow Introduction to Storm Introduction to Spark Spark What is Apache Spark? Using the Spark Shell RDDs (Resilient Distributed Datasets) Functional Programming in Spark Working with RDDs in Spark A Closer Look at RDDs Key-Value Pair RDDs MapReduce Other Pair RDD Operations Final project Real World Use Case Scenarios Understand the implementation of Hadoop in Real World and its benefits. Final project including integration various key components Follow-up session: Tips and tricks for projects, certification and interviews etc