Spark

Techieventures Technologies delivering the best training program in in Spark. Apache Spark is a fast in-memory big data processing engine equipped with the abilities of Machine Learning which runs up to 100 times faster than Apache Hadoop. It is a unified engine that is built around the concept of ease. Our training program based on real time project in Apache Spark. We are one of the best institute for Apache Spark in Bangalore.

Course Content

  • Learn Scala & its Implementation

  • Why Scala
  • Scala Installation
  • Get deep insight into functioning of Scala
  • Execute pattern matching in Scala
  • Functional programming in Scala-closures, currying, expressions, Anonymous Functions
  • Know the concepts of classes in Scala
  • Object Orientation in Scala-Primary, Auxiliary constructors, Singleton & companion Objects
  • Traits & Abstract clases in Scala
  • Scala Simple Build Tool
  • Building with Maven
  • Scala Java Interoperability
  • Scala Collections
  • Mutable Collections Vs Immutable Collections
  • Spark

  • Introduction to Spark
  • What is Apache Spark
  • Spark Installation
  • Spark Configuration
  • Spark Context
  • Resilient Distributed Dataset( RDD)- Features, Partitions, Tuning, Parallelism
  • Functional Programming with Spark
  • Working With Spark

  • RDD Operations-Transformations and Actions
  • Types of RDDs
  • Key-Value Pair RDDs-Transformations and Actions
  • Map-Reduce and Pair RDD Operations
  • Serialization
  • Spark on Yarn Framework
  • Writing Spark Applications

  • Spark Applications Vs Spark Shell
  • Creating SparkContext
  • Configuring Spark Properties
  • Building and Running Spark Application
  • Logging
  • Spark Job Anatomy
  • RDD Lineage
  • Caching
  • Distributed Persistance
  • Spark Performance Tuning

  • Shared Variables: Broadcast Variables, Accumulators
  • Per Partitions Processing
  • Common Performance issues
  • Spark File Formats API

  • Text,CSV, Sequence, Parquet, ORC
  • Compression Techniques:- Snappy, Zlib, GZip
  • Spark SQL

  • Introduction to Spark SQL
  • HiveContext
  • SQL Datatypes
  • Dataframes Vs RDDs
  • Operations of DFs
  • Parquet File Format with Spark- Read, Write, Partitioning, Merging Schema
  • ORC Files
  • JSON Files
  • Inferring Schema Programmatically
  • Custom Case Classes
  • Temp Table vs Persistance Table
  • Writing UDFs
  • Hive Support
  • JDBC Support-Examples
  • Introduction to Spark Streaming
  • Streaming Operations
  • Sliding Window Operations
  • Developing Spark Streaming Applications
  • Introduction to Machine Learning
  • System Requirements
  • Machine Learning Basics
  • Algorithms
  • Conclusion

Other Courses

Download PDF
Please enter valid Name
Please enter valid Email
Please enter valid Phone Number