What is Apache Spark? Framework?

Apace Shark was a large-scale data warehouse system for Spark designed to be the next-gen Hive. However, it quickly turned out that that goal wasn’t enough to align with rest of the spark projects. Because it was only limited to Hadoop and map-reduce like Hive is.

The Shark project has succeeded by the Apache Spark SQL which is a generic runtime for processing big data with SQL. It also has a Hive context built-in for compatibility with Hive. :-Microsoft Visual Studio What’s New?-:

Shark is impressive for in-memory and more specifically for complex computing.
The key advantage shark offers caching intermediate data in-memory for better access times, the recent developments are:

  1. Real-Time querying of data: Querying in seconds rather than minutes using Shark
  2. Stream processing: Fraud detection and log processing in live streams for alerts, aggregates, and analysis
  3. Sensor data processing: Where data is fetched and joined from multiple sources, in-memory dataset shine as they easy and fast to process.We have dealt with a lot more use cases with several companies using Shark & Spark.

These are the things that shark deals within the computer world of analytics:-

  1. Blazing fast cluster computing, so that it can use for Distributed tasks which involve map-reduce
  2. Provides a framework for advance Analytics including ML lib.
  3. Provides a Streaming analytics engine.
  4. Provides Graph processing framework called GraphX.

The primary motives of a Shark-induced programming include the following functions:-

  • Risk Modelling.
  • Parallel Processing of Large Graphs.
  • Advanced Analytics.
  • Recommendation Engines.
  • Text Processing.

Apache Spark is an extensive warehouse system of data for Spark designed to be existing incompatibility with Apache Hive. It can perform Hive QL queries rapidly compared to Hive without creating any modification to the existing data or queries. Apache Spark supports Hive’s query language, meta store, serialization formats, and user-defined functions, providing seamless integration with current Hive deployments and a familiar, more powerful option for developed ones.

Shark built on top of Spark, a parallel data execution engine that is effortlessly and error-free. It also functions even when data contained on disk, Shark can be conspicuously accelerated than Hive because of the rapid execution engine. It averts the function such as launching overhead of Hadoop Map Reduce and does not require the occurrence of intermediate data between stages on disk. Thanks to this fast engine, Shark can answer queries in sub-seconds of suspended inactivity.

Shark lets the end users to coup this mundane locality by keeping their working set of data across array’s memory, or in database terms, to create in-memory developed views. Indian ISP

Common data types can cache in a columnar format which is very competent for storage, and useless data collection yet provides maximum performance and is in orders of magnitude faster than reading data directly from disk.

The in-memory cluster-computing framework that powers Shark is Apache Spark.

Apache Spark was the first on Hadoop systems, to run SQL interactively. It stemmed out of a general run-time called Apache Spark.

By making use of Apache Spark, Apache Spark could now achieve the dual responsibilities of not only scaling as well as traditional Map Reduce jobs but also being equally fast as an EDW. Constructing a SQL query engine with a general runtime as Spark unifies the batch and interactive query models like MapReduce. It enables application and data scientists alike to combine these modes for fast computing like for example, the use of D-Streams.

Its an open source project Shark started in the UC Berkeley AMP Lab.

The research and development supported in part by NSF CISE Expeditions award CCF-1139158 and DARPA XData Award FA2019-12-2-0331, and gifts from:-

  • Amazon Web Services
  • Google
  • SAP
  • Blue Goji
  • Cisco
  • Clearstory Data
  • Cloudera
  • Ericsson
  • Facebook
  • General Electric
  • Hortonworks
  • Huawei
  • Intel
  • Microsoft
  • NetApp
  • Oracle
  • Quanta
  • Samsung
  • Splunk
  • VMware
  • Yahoo

Hence to narrow the focus on the development of Apache Spark SQL, Shark is dead. Here is the announcement

Additional note, Hive has caught up, and now the support for shark inside hive is being worked on as of Hive 1.1

1 thought on “What is Apache Spark? Framework?

Leave a Reply

Your email address will not be published. Required fields are marked *