Any App. Any Server. Any Cloud.

Adine Deford

Subscribe to Adine Deford: eMailAlertsEmail Alerts
Get Adine Deford: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn

Related Topics: Cloud Computing, Amazon Cloud Journal, Storage Journal


Hadoop Distributed Storage Management on Amazon Web Services

Hadoop and AWS Make It Easy to Scale Out to Petabytes of Data

Hadoop and AWS are enterprise ready cloud computing, distributed technologies.

It is straightforward to add more DataNodes i.e. storage to Hadoop cluster on AWS. You just need to create another AWS instance and add a new node to Hadoop cluster. Hadoop will take care of balancing storage to keep level of file system utilization across DataNodes as  even  as possible.

Cloudera's distribution of Hadoop includes Cloudera Manager which makes it simple to install Hadoop and add new nodes to it. Screenshot below shows an existing HDFS service with two DataNodes. We will expand HDFS by adding a third DataNode to it:

Once we click Add button new host can be picked from the list of available servers ( in this case it is server ip-10-0-0-40 ):

New server is now DataNode-3, i.e. it is part of our Hadoop cluster.

New DataNode-3 still does not contain any data. Hadoop Balancer will distribute data blocks across nodes.

Rebalancing threshold is configurable parameter ( default value is 10% ). This parameter directs Balancer to start moving data blocks around when file system utilization on any node diverges from average utilization by more than a threshold.

Hadoop will also take care that new data blocks being written to HDFS ( when new files are added to Hadoop ) are loaded across available DataNodes.

More Stories By Ranko Mosic

Ranko Mosic, BScEng, is specializing in Big Data/Data Architecture consulting services ( database/data architecture, machine learning ). His clients are in finance, retail, telecommunications industries. Ranko is welcoming inquiries about his availability for consulting engagements and can be reached at 408-757-0053 or [email protected]