Just another Blog

My Technical, Professional and at times Personal Blog

Showing posts with label Azure Data Marketplace. Show all posts

Country and Region Reference Data on Azure Data Market

Let me show you a quick way to setup Country and Region for your Data Warehouse or Data Mart. Normally, you expect to find this kind of reference data within your organization in some source system or even better in a MDCP (Master Data Control Point).

If you can’t find this data anywhere, then check out the Azure Data Market.

You can sign up for free here and browse a number of interesting datasets. One of them is the World Development Indicators. Not all the data sets out there are free, but the World Development Indicator is free.

Once you signup, you can go down to Explore the Dataset

image

You will see a query panel:

image

Check out the GetCountries dataset

image

I highly recommend you install the Excel Data Market Excel addin that will make this a breeze:

image

So here is how it looks in Excel:

image

Pass parameters, if any and you are done. The parameter I used is en or EN for english.

Here is the result:

image

There are some aggregate information in there that you can clean up. Also, you may want to clean up the (all income levels).

I am not recommending this as the best way of doing it, but if you find yourself with no reference country, region data, then give Azure Data Market a try.

Microsoft BI – Detect Categories using SQL Server 2012 Data Mining Add-In for Excel 2010

In this post I am going to look at the ‘Detect Categories’  table analysis tool using SQL Server 2012 Data Mining Add-In for Excel 2010. The ‘Detect Categories’ tool uses Microsoft’s clustering algorithm to automatically categorize rows in a table that have similar characteristics. This is the simplest way to categorize data in a table and understand its characteristics.

I am going use a data set from the Windows Azure Marketplace called ‘Places of Interests in Singapore’.

image

The file downloaded as a ‘.csv’ and contains about 26,000 rows. For this example, I am going to run the analysis on a subset of this data by creating a sample set.

image

Here are the steps to create a sample. Click on the Data Mining table and choose ‘Sample Data’

image

Take a random sampling:

image

I’ll sample ~30%:

image

Click on ‘Detect Categories’ from under ‘Table Tools’>’Analyze’ menu:

image

In the column list, I’ll choose: company_type, category, status, food_cuisine. Set the number of categories to 4 and Run the analysis. A rule of thumb I use to select columns is choose those columns that have low distinct values (low cardinality). You want to skip columns such as names, addresses, dates etc that have large distinct values.

This creates a categories report:

image

The first category created is for Shopping based on the record count, so I am going to rename ‘Category 1’ to ‘Shopping’. This updates the table below and the Chart.

image

The 2nd category is “Food and Beverage”:

image

The 3rd category is a hodgepodge of different categories so I’ll call it Other and the 4th category is called ‘Beauty and Spa’.

To refresh the chart, please Ctl-Alt-F5

image

The new Category column is also added to the table so you can now sort or filter by category:

image

How to import Azure Marketplace data into Hadoop on Azure

Here is a quick demo on how to import Azure Marketplace data into Hadoop on Azure and how to query it in Hive using the Interactive JavaScript console.

Demo provided by MicrosoftBI team

Be sure to check out how to extract data from Hadoop on Azure using Hive Excel addin http://grow-n-shine.blogspot.com/2012/02/use-excel-hive-add-in-to-access-hive-on.html

Currently, https://www.hadooponazure.com is available by invitation only.

Using Azure Data Marketplace and PowerPivot to Build Date Dimension

In this post, I describe how to quickly build a Date Dimension for your data warehouse by leveraging the Azure Data Marketplace and Rather than building the date table yourself, you can just download the date table from the Azure Data Marketplace using PowerPivot.

Start by signing up at the Azure Data Marketplace https://datamarket.azure.com/. Windows Data Azure Marketplace is envisioned to be the One-Stop shop for premium data and applications.

Next search for DateStream service.

image

The DateStream service provides all the dates from Year 1900 to 2100. You probably will not need all of the data, but that can be easily filtered out once you have all of the data downloaded. If you are not interested in extracting ~200 years worth of dates, you can apply filters at the source and extract only what you are interested in. Check out this post: http://www.bp-msbi.com/2011/10/range-queries-with-azure-datamarket-feeds/

The DateStream feed currently consists of a number of date tables:

  • BasicCalendarEnglish
  • BasicCalendarUS
  • BasicCalendarDanish
  • BasicCalendarHebrew
  • BasicCalendarGerman
  • BasicCalendarBulgarian
  • ExtendedCalendar

You’ll need to subscribe to this data set. Once subscribed, you will be able to explore this data set right inside the marketplace.

image

image

I will download a connection to Excel PowerPivot which will download a ServiceQuery.atomsvc file. You could choose to download just a CSV file if you want.

image

Doubleclick on the ServiceQuery.atomsvc file and that should open the table import wizard. Provide your account key and proceed importing the data.

image

After you import the data:

image

After I downloaded this data, I created a count measure as follows: cnt:=COUNTROWS(DateStream). The table contained 73,414 rows since I didn’t apply any source filters.

I also created an additional column in the table and called it: MonthName2 with the following DAX: =[MonthOfYear]& " - " & [MonthName]. This will help me create a date hierarchy and sort the months in the right order. I switched to the “Diagram View” to create a date hierarchy – simply right click and create a hierarchy and drag and drop the columns that form the hierarchy.

image

Next, I created a Pivot table in Excel:

image

You don’t have to create a pivot table if all you are interested is building a Date table in your database. You should already have the data in CSV or Excel that you can import into your relational database.

Google+ Followers