University Of Michigan Flint Chapter 10 Hadoop and its Core Components Paper After reading Chapter 10 in your textbook, please provide a brief response to

University Of Michigan Flint Chapter 10 Hadoop and its Core Components Paper After reading Chapter 10 in your textbook, please provide a brief response to the following assessment questions.

In your own words and understanding after reading Chapter 10:

Don't use plagiarized sources. Get Your Custom Essay on
University Of Michigan Flint Chapter 10 Hadoop and its Core Components Paper After reading Chapter 10 in your textbook, please provide a brief response to
Just from $13/Page
Order Essay

What is Hadoop?
Name the two main components of Hadoop and discuss the roles of those two components during system failures?

Your initial post should be at least 100 words.

***Please follow APA format***

Youtube link: youtube.com/watch?v=MfF750YVDxM

https://hadoopecosystemtable.github.io/ Chap 10: Adv. Analytics – Tech & Tools:
MapReduce and Hadoop
10.1 Analytics for Unstructured Data
10.1.1 Use Cases

IBM Watson – Jeopardy playing machine

To educate Watson, Hadoop was utilized to process data sources


LinkedIn – network of over 250 million users in 200 countries


Encyclopedias, dictionaries, news wire feeds, literature, Wikipedia, etc.
Hadoop is used to process daily transaction logs, examine users’
activities, feed extracted data back to production systems, restructure
the data, develop and test analytic models
Yahoo! – large Hadoop deployment

Search index creation and maintenance, Webpage content optimization,
spam filters, etc.
10.1 Analytics for Unstructured Data
10.1.2 MapReduce


The MapReduce paradigm breaks a large task into smaller
tasks, runs the tasks in parallel, and consolidates the outputs
of the individual tasks into the final output
Map



Reduce



Applies an operation to a piece of data
Provides some intermediate output
Consolidates the intermediate outputs from the map steps
Provides the final output
Each step uses key/value pairs, denoted as input
and output
10.1 Analytics for Unstructured Data
10.1.2 MapReduce
MapReduce word count example
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop


MapReduce is a simple paradigm to understand but not easy to
implement
Executing a MapReduce job requires






Jobs scheduled based on system’s workload
Input data spread across cluster of machines
Map step spread across distributed system
Intermediate outputs collected and provided to proper machines for
reduce step
Final output made available to another user, another application, or
another MapReduce job
Next few slides present overview of Hadoop environment
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop

Hadoop Distributed File System (HDFS)


File system that distributes data across a cluster to take
advantage of the parallel processing of MapReduce
HDFS uses three Java daemons (background processors)
1.
2.
3.
NameNode – determines and tracks where various blocks of data
are stored
DataNode – manages the data stored on each machine
Secondary NameNode – performs some of the NameNode tasks to
reduce the load on NameNode
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop

Structuring a MapReduce Job in Hadoop

A typical MapReduce Java program has three classes



Driver – provides details such as input file locations, names of
mapper and reducer classes, location of reduce class output, etc.
Mapper – provides logic to process each data block
Reducer – reduces the data provided by the mapper
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop

Structuring a MapReduce Job in Hadoop
A file stored in HDFS
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop

Additional Considerations in Structuring a MapReduce Job
Shuffle and Sort
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
Using a
combiner
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
Using a custom
partitioner
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop



Developing and Executing a Hadoop MapReduce Program
Common practice is to use an IDE tool such as Eclipse
The MapReduce program consists of three Java files


Driver code, map code, and reduce code
Java code is compiled and stored in a JAR file and executed
against the specified HDFS input files
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop




Yet Another Resource Negotiator (YARN)
Hadoop continues to undergo development
An important one was to separate the MapReduce
functionality from the management of running jobs
and the distributed environment
This rewrite is sometimes called

Yet Another Resource Negotiator (YARN)
10.2 The Hadoop Ecosystem

Tools have been developed to make Hadoop easier to
use and provide additional functionality and features




Pig – provides a high-level data-flow programming language
Hive – provides SQL-like access
HBase – provides real-time reads and writes
Mahout – provides analytical tools
10.2 The Hadoop Ecosystem
10.2.1 Pig

Pig consists of



A data flow language called Pig Latin
An environment to execute the Pig code
Example of Pig commands
10.2 The Hadoop Ecosystem
10.2.1 Pig – Built-in Pig Functions
10.2 The Hadoop Ecosystem
10.2.2 Hive


The Hive language, Hive Query Language (HiveQL), resembles
SQL rather than a scripting language
Example Hive code
10.2 The Hadoop Ecosystem
10.2.3 HBase

Unlike Pig and Hive, intended for batch applications, HBase can
provide real-time read and write access to huge datasets

Example – Choosing a shipping address at checkout
10.2 The Hadoop Ecosystem
10.2.4 Mahout



Mahout permits the application of analytical techniques within
the Hadoop environment
Mahout provides Java code that implements the usual
classification, clustering, and recommenders/collaborative
filtering algorithms
Users can download Apache Hadoop directly from
www.apache.org or use commercial packages

For example, the Pivital company provides Pivital HD Enterprise
10.2 The Hadoop Ecosystem
10.2.4 Mahout

Components of Pivotal HD Enterprise
10.3 NoSQL


NoSQL = Not only Structured Query Language
Four major categories of NoSQL tools




Key/value stores – contains data (the value) accessed by the key
Document stores – good when the value of the key/value pair is
a file
Column family stores – good for sparse datasets
Graph stores – good for items and relationships between them

Social networks like Facebook and LinkedIn
10.3 NoSQL

Examples of NoSQL Data Stores

Purchase answer to see full
attachment

Homework Writings Pro
Calculate your paper price
Pages (550 words)
Approximate price: -

Why should I choose Homework Writings Pro as my essay writing service?

We Follow Instructions and Give Quality Papers

We are strict in following paper instructions. You are welcome to provide directions to your writer, who will follow it as a law in customizing your paper. Quality is guaranteed! Every paper is carefully checked before delivery. Our writers are professionals and always deliver the highest quality work.

Professional and Experienced Academic Writers

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Reasonable Prices and Free Unlimited Revisions

Typical student budget? No problem. Affordable rates, generous discounts - the more you order, the more you save. We reward loyalty and welcome new customers. Furthermore, if you think we missed something, please send your order for a free review. You can do this yourself by logging into your personal account or by contacting our support..

Essay Delivered On Time and 100% Money-Back-Guarantee

Your essay will arrive on time, or even before your deadline – even if you request your paper within hours. You won’t be kept waiting, so relax and work on other tasks.We also guatantee a refund in case you decide to cancel your order.

100% Original Essay and Confidentiality

Anti-plagiarism policy. The authenticity of each essay is carefully checked, resulting in truly unique works. Our collaboration is a secret kept safe with us. We only need your email address to send you a unique username and password. We never share personal customer information.

24/7 Customer Support

We recognize that people around the world use our services in different time zones, so we have a support team that is happy to help you use our service. Our writing service has a 24/7 support policy. Contact us and discover all the details that may interest you!

Try it now!

Calculate the price of your order

Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

Our reputation for excellence in providing professional tailor-made essay writing services to students of different academic levels is the best proof of our reliability and quality of service we offer.

Essays

Essay Writing Service

When using our academic writing services, you can get help with different types of work including college essays, research articles, writing, essay writing, various academic reports, book reports and so on. Whatever your task, homeworkwritingspro.com has experienced specialists qualified enough to handle it professionally.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our professional editor will check your grammar to make sure it is free from errors. You can rest assured that we will do our best to provide you with a piece of dignified academic writing. Homeworkwritingpro experts can manage any assignment in any academic field.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.