University Of Michigan Flint Chapter 10 Hadoop and its Core Components Paper After reading Chapter 10 in your textbook, please provide a brief response to the following assessment questions.
In your own words and understanding after reading Chapter 10:
What is Hadoop?
Name the two main components of Hadoop and discuss the roles of those two components during system failures?
Your initial post should be at least 100 words.
***Please follow APA format***
Youtube link: youtube.com/watch?v=MfF750YVDxM
https://hadoopecosystemtable.github.io/ Chap 10: Adv. Analytics – Tech & Tools:
MapReduce and Hadoop
10.1 Analytics for Unstructured Data
10.1.1 Use Cases
◼
IBM Watson – Jeopardy playing machine
◼
To educate Watson, Hadoop was utilized to process data sources
◼
◼
LinkedIn – network of over 250 million users in 200 countries
◼
◼
Encyclopedias, dictionaries, news wire feeds, literature, Wikipedia, etc.
Hadoop is used to process daily transaction logs, examine users’
activities, feed extracted data back to production systems, restructure
the data, develop and test analytic models
Yahoo! – large Hadoop deployment
◼
Search index creation and maintenance, Webpage content optimization,
spam filters, etc.
10.1 Analytics for Unstructured Data
10.1.2 MapReduce
◼
◼
The MapReduce paradigm breaks a large task into smaller
tasks, runs the tasks in parallel, and consolidates the outputs
of the individual tasks into the final output
Map
◼
◼
◼
Reduce
◼
◼
◼
Applies an operation to a piece of data
Provides some intermediate output
Consolidates the intermediate outputs from the map steps
Provides the final output
Each step uses key/value pairs, denoted as input
and output
10.1 Analytics for Unstructured Data
10.1.2 MapReduce
MapReduce word count example
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
◼
◼
MapReduce is a simple paradigm to understand but not easy to
implement
Executing a MapReduce job requires
◼
◼
◼
◼
◼
◼
Jobs scheduled based on system’s workload
Input data spread across cluster of machines
Map step spread across distributed system
Intermediate outputs collected and provided to proper machines for
reduce step
Final output made available to another user, another application, or
another MapReduce job
Next few slides present overview of Hadoop environment
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
◼
Hadoop Distributed File System (HDFS)
◼
◼
File system that distributes data across a cluster to take
advantage of the parallel processing of MapReduce
HDFS uses three Java daemons (background processors)
1.
2.
3.
NameNode – determines and tracks where various blocks of data
are stored
DataNode – manages the data stored on each machine
Secondary NameNode – performs some of the NameNode tasks to
reduce the load on NameNode
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
◼
Structuring a MapReduce Job in Hadoop
◼
A typical MapReduce Java program has three classes
◼
◼
◼
Driver – provides details such as input file locations, names of
mapper and reducer classes, location of reduce class output, etc.
Mapper – provides logic to process each data block
Reducer – reduces the data provided by the mapper
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
◼
Structuring a MapReduce Job in Hadoop
A file stored in HDFS
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
◼
Additional Considerations in Structuring a MapReduce Job
Shuffle and Sort
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
Using a
combiner
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
Using a custom
partitioner
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
◼
◼
◼
Developing and Executing a Hadoop MapReduce Program
Common practice is to use an IDE tool such as Eclipse
The MapReduce program consists of three Java files
◼
◼
Driver code, map code, and reduce code
Java code is compiled and stored in a JAR file and executed
against the specified HDFS input files
10.1 Analytics for Unstructured Data
10.1.3 Apache Hadoop
◼
◼
◼
◼
Yet Another Resource Negotiator (YARN)
Hadoop continues to undergo development
An important one was to separate the MapReduce
functionality from the management of running jobs
and the distributed environment
This rewrite is sometimes called
◼
Yet Another Resource Negotiator (YARN)
10.2 The Hadoop Ecosystem
◼
Tools have been developed to make Hadoop easier to
use and provide additional functionality and features
◼
◼
◼
◼
Pig – provides a high-level data-flow programming language
Hive – provides SQL-like access
HBase – provides real-time reads and writes
Mahout – provides analytical tools
10.2 The Hadoop Ecosystem
10.2.1 Pig
◼
Pig consists of
◼
◼
◼
A data flow language called Pig Latin
An environment to execute the Pig code
Example of Pig commands
10.2 The Hadoop Ecosystem
10.2.1 Pig – Built-in Pig Functions
10.2 The Hadoop Ecosystem
10.2.2 Hive
◼
◼
The Hive language, Hive Query Language (HiveQL), resembles
SQL rather than a scripting language
Example Hive code
10.2 The Hadoop Ecosystem
10.2.3 HBase
◼
Unlike Pig and Hive, intended for batch applications, HBase can
provide real-time read and write access to huge datasets
◼
Example – Choosing a shipping address at checkout
10.2 The Hadoop Ecosystem
10.2.4 Mahout
◼
◼
◼
Mahout permits the application of analytical techniques within
the Hadoop environment
Mahout provides Java code that implements the usual
classification, clustering, and recommenders/collaborative
filtering algorithms
Users can download Apache Hadoop directly from
www.apache.org or use commercial packages
◼
For example, the Pivital company provides Pivital HD Enterprise
10.2 The Hadoop Ecosystem
10.2.4 Mahout
◼
Components of Pivotal HD Enterprise
10.3 NoSQL
◼
◼
NoSQL = Not only Structured Query Language
Four major categories of NoSQL tools
◼
◼
◼
◼
Key/value stores – contains data (the value) accessed by the key
Document stores – good when the value of the key/value pair is
a file
Column family stores – good for sparse datasets
Graph stores – good for items and relationships between them
◼
Social networks like Facebook and LinkedIn
10.3 NoSQL
◼
Examples of NoSQL Data Stores
Purchase answer to see full
attachment
Why should I choose Homework Writings Pro as my essay writing service?
We Follow Instructions and Give Quality Papers
We are strict in following paper instructions. You are welcome to provide directions to your writer, who will follow it as a law in customizing your paper. Quality is guaranteed! Every paper is carefully checked before delivery. Our writers are professionals and always deliver the highest quality work.
Professional and Experienced Academic Writers
We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.
Reasonable Prices and Free Unlimited Revisions
Typical student budget? No problem. Affordable rates, generous discounts - the more you order, the more you save. We reward loyalty and welcome new customers. Furthermore, if you think we missed something, please send your order for a free review. You can do this yourself by logging into your personal account or by contacting our support..
Essay Delivered On Time and 100% Money-Back-Guarantee
Your essay will arrive on time, or even before your deadline – even if you request your paper within hours. You won’t be kept waiting, so relax and work on other tasks.We also guatantee a refund in case you decide to cancel your order.
100% Original Essay and Confidentiality
Anti-plagiarism policy. The authenticity of each essay is carefully checked, resulting in truly unique works. Our collaboration is a secret kept safe with us. We only need your email address to send you a unique username and password. We never share personal customer information.
24/7 Customer Support
We recognize that people around the world use our services in different time zones, so we have a support team that is happy to help you use our service. Our writing service has a 24/7 support policy. Contact us and discover all the details that may interest you!
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.
Our Services
Our reputation for excellence in providing professional tailor-made essay writing services to students of different academic levels is the best proof of our reliability and quality of service we offer.
Essays
When using our academic writing services, you can get help with different types of work including college essays, research articles, writing, essay writing, various academic reports, book reports and so on. Whatever your task, homeworkwritingspro.com has experienced specialists qualified enough to handle it professionally.
Admissions
Admission Essays & Business Writing Help
An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.
Reviews
Editing Support
Our professional editor will check your grammar to make sure it is free from errors. You can rest assured that we will do our best to provide you with a piece of dignified academic writing. Homeworkwritingpro experts can manage any assignment in any academic field.
Reviews
Revision Support
If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.