Great presentation by Dave Campbell on Cloud Data and Storage here. I’ve highlighted my learning below but be sure to check out the presentation. Hadoop for Microsoft Azure looks promising.
We have 3 storage options in Azure: Blog, Table and SQL Azure
Azure Storage options:
SQL Storage options:
Everything is available at the portal:
Is contained within a hierarchical namespace with a RESTful interface. You can GET and PUT Blobs. There are two types of Blobs: Page and Block. The RESTful interace allows you to build client tools/libraries against the Azure Storage Service.
In Azure Table storage is a service – collection of tables. Table storage are simple, REST accessible, highly scalable and cost effective.
SQL Server database technology is delivered as a service on Windows Azure Platform ideal for both simple and complex applications that is enterprise ready and designed to scale out elastically with demand. The advantages are you are no longer to physically manage these servers. SQL Azure offers a pay as go service:
SQL Azure provides thin client and thick client to manage your SQL Azure database.
Thin client (browser):
You can use SQL Server Administrator to connect to a SQN Azure DB:
or connect using Visual Studio 2010:
The cool thing is Visual Studio lets you deploy an Azure database easily in a few steps:
Step 1: Create a Project from your local database:
Step 2: Set the Target Platform
Step 3: Publish to SQL Azure
SQL Federations allows scaling our your database
- Integrated database sharding that can scale to hundreds of nodes
- Multi-tenancy via flexible repartioning
- Online split operations to minimize downtime
- Automatic data discovery regardless of changes in how data is partitioned
What is Database Sharding?
Database sharding is a method of horizontal partitioning in a database or search engine. Each individual partition is referred to as a shard or database shard.
You must enable Federation on tables with the below SQL:
Federated Databases can be managed using the think client:
Note the results of a stress test conducted on a Federated VS Non-Federated database.
Federated database let you distribute IO load and ability to manage space.
Hadoop on Azure:
Microsoft is working with the Apache community to get a Hadoop version for Microsoft Windows. The Developer Preview for the Apache Hadoop- based Services for Windows Azure is available here https://www.hadooponazure.com/
Interactive mode: No tools to be installed
cat is the command to look at the contents of a file
# is the command to look at all the commands available
The query to get the word could on some of the Gutenberg projects is below:
When you run the query, the Hadoop Job Scheduler will automatically send the job (query) out to the cluster with one process for each of the above files.
from(“gutenberg”) – gets the data from your dataset
mapReduce(“WordCount.js”, “word, count:long”) – map reduce and pass your Word Count function
orderBy(“count DESC”) – order results by descending
take(10) – reduction to the top 10 words
Hadoop for Azure lets you plot graphs based on the data you have:
Hadoop supports multiple languages such as: Python, Ruby and .NET
What is Hive?
Hive is structured data that sits over the Hadoop layer (unstructured data). Note the Hive query syntax below: