Just another Blog

My Technical, Professional and at times Personal Blog

Showing posts with label Hadoop. Show all posts

Cloud (Azure) Data and Storage

Great presentation by Dave Campbell on Cloud Data and Storage . I’ve highlighted my learning below but be sure to check out the presentation. Hadoop for Microsoft Azure looks promising.

We have 3 storage options in Azure: Blog, Table and SQL Azure

image

Azure Storage options:

image

SQL Storage options:

image

Everything is available at the portal:

image

Blob:

Is contained within a hierarchical namespace with a RESTful interface. You can GET and PUT Blobs. There are two types of Blobs: Page and Block. The RESTful interace allows you to build client tools/libraries against the Azure Storage Service.

Table Storage:

In Azure Table storage is a service – collection of tables. Table storage are simple, REST accessible, highly scalable and cost effective.

SQL Azure:

SQL Server database technology is delivered as a service on Windows Azure Platform ideal for both simple and complex applications that is enterprise ready and designed to scale out elastically with demand. The advantages are you are no longer to physically manage these servers. SQL Azure offers a pay as go service:

image

Clients:

SQL Azure provides thin client and thick client to manage your SQL Azure database.

Thin client (browser):

image

Thick Client:

You can use SQL Server Administrator to connect to a SQN Azure DB:

image

or connect using Visual Studio 2010:

image

The cool thing is Visual Studio lets you deploy an Azure database easily in a few steps:

Step 1: Create a Project from your local database:

image

image

Step 2: Set the Target Platform

image

Step 3: Publish to SQL Azure

image

SQL Federations allows scaling our your database

  • Integrated database sharding that can scale to hundreds of nodes
  • Multi-tenancy via flexible repartioning
  • Online split operations to minimize downtime
  • Automatic data discovery regardless of changes in how data is partitioned

What is Database Sharding?

Database sharding is a method of horizontal partitioning in a database or search engine. Each individual partition is referred to as a shard or database shard.

You must enable Federation on tables with the below SQL:

image

image

Federated Databases can be managed using the think client:

image

image

Note the results of a stress test conducted on a Federated VS Non-Federated database.

image

Federated database let you distribute IO load and ability to manage space.

Hadoop on Azure:

Microsoft is working with the Apache community to get a Hadoop version for Microsoft Windows. The Developer Preview for the Apache Hadoop- based Services for Windows Azure is available here https://www.hadooponazure.com/

image

Interactive mode: No tools to be installed

image

cat is the command to look at the contents of a file

image

# is the command to look at all the commands available

image

image

The query to get the word could on some of the Gutenberg projects is below:

image

When you run the query, the Hadoop Job Scheduler will automatically send the job (query) out to the cluster with one process for each of the above files.

from(“gutenberg”) – gets the data from your dataset

mapReduce(“WordCount.js”, “word, count:long”) – map reduce and pass your Word Count function

orderBy(“count DESC”) – order results by descending

take(10) – reduction to the top 10 words

Hadoop for Azure lets you plot graphs based on the data you have:

image

image

Hadoop supports multiple languages such as: Python, Ruby and .NET

What is Hive?

image

Hive is structured data that sits over the Hadoop layer (unstructured data). Note the Hive query syntax below:

image

Big Data Keynote at Oracle Open World 2011


See Oracle’s solution to solving “Big Data” problem:

Oracle NoSQL database
Oracle Enterprise Manager 12c (cloud)
Oracle Data Integrator
Oracle Loader for Hadoop (moves data from Hadoop to Oracle database)
Oracle R Enterprise (programming language built into the database) (similar to SAS)
Integration with OBIEE (demo)
Oracle Exalytics
Oracle Big Data Appliance


Good Presentation from Oracle Open World 2011 here

The video quality is not great, but the presentation is very insightful.



Google+ Followers