Let me show you a quick way to setup Country and Region for your Data Warehouse or Data Mart. Normally, you expect to find this kind of reference data within your organization in some source system or even better in a MDCP (Master Data Control Point).
If you can’t find this data anywhere, then check out the Azure Data Market.
You can sign up for free here and browse a number of interesting datasets. One of them is the World Development Indicators. Not all the data sets out there are free, but the World Development Indicator is free.
Once you signup, you can go down to Explore the Dataset
You will see a query panel:
Check out the GetCountries dataset
I highly recommend you install the Excel Data Market Excel addin that will make this a breeze:
So here is how it looks in Excel:
Pass parameters, if any and you are done. The parameter I used is en or EN for english.
Here is the result:
There are some aggregate information in there that you can clean up. Also, you may want to clean up the (all income levels).
I am not recommending this as the best way of doing it, but if you find yourself with no reference country, region data, then give Azure Data Market a try.
In this post I am going to look at the ‘Detect Categories’ table analysis tool using SQL Server 2012 Data Mining Add-In for Excel 2010. The ‘Detect Categories’ tool uses Microsoft’s clustering algorithm to automatically categorize rows in a table that have similar characteristics. This is the simplest way to categorize data in a table and understand its characteristics.
I am going use a data set from the Windows Azure Marketplace called ‘Places of Interests in Singapore’.
The file downloaded as a ‘.csv’ and contains about 26,000 rows. For this example, I am going to run the analysis on a subset of this data by creating a sample set.
Here are the steps to create a sample. Click on the Data Mining table and choose ‘Sample Data’
Take a random sampling:
I’ll sample ~30%:
Click on ‘Detect Categories’ from under ‘Table Tools’>’Analyze’ menu:
In the column list, I’ll choose: company_type, category, status, food_cuisine. Set the number of categories to 4 and Run the analysis. A rule of thumb I use to select columns is choose those columns that have low distinct values (low cardinality). You want to skip columns such as names, addresses, dates etc that have large distinct values.
This creates a categories report:
The first category created is for Shopping based on the record count, so I am going to rename ‘Category 1’ to ‘Shopping’. This updates the table below and the Chart.
The 2nd category is “Food and Beverage”:
The 3rd category is a hodgepodge of different categories so I’ll call it Other and the 4th category is called ‘Beauty and Spa’.
To refresh the chart, please Ctl-Alt-F5
The new Category column is also added to the table so you can now sort or filter by category:
Be sure to check out how to extract data from Hadoop on Azure using Hive Excel addin http://grow-n-shine.blogspot.com/2012/02/use-excel-hive-add-in-to-access-hive-on.html
Currently, https://www.hadooponazure.com is available by invitation only.
In this post, I describe how to quickly build a Date Dimension for your data warehouse by leveraging the Azure Data Marketplace and Excel PowerPivot. Rather than building the date table yourself, you can just download the date table from the Azure Data Marketplace using PowerPivot.
Start by signing up at the Azure Data Marketplace https://datamarket.azure.com/. Windows Data Azure Marketplace is envisioned to be the One-Stop shop for premium data and applications.
Next search for DateStream service.
The DateStream service provides all the dates from Year 1900 to 2100. You probably will not need all of the data, but that can be easily filtered out once you have all of the data downloaded. If you are not interested in extracting ~200 years worth of dates, you can apply filters at the source and extract only what you are interested in. Check out this post: http://www.bp-msbi.com/2011/10/range-queries-with-azure-datamarket-feeds/
The DateStream feed currently consists of a number of date tables:
You’ll need to subscribe to this data set. Once subscribed, you will be able to explore this data set right inside the marketplace.
I will download a connection to Excel PowerPivot which will download a ServiceQuery.atomsvc file. You could choose to download just a CSV file if you want.
Doubleclick on the ServiceQuery.atomsvc file and that should open the table import wizard. Provide your account key and proceed importing the data.
After you import the data:
After I downloaded this data, I created a count measure as follows: cnt:=COUNTROWS(DateStream). The table contained 73,414 rows since I didn’t apply any source filters.
I also created an additional column in the table and called it: MonthName2 with the following DAX: =[MonthOfYear]& " - " & [MonthName]. This will help me create a date hierarchy and sort the months in the right order. I switched to the “Diagram View” to create a date hierarchy – simply right click and create a hierarchy and drag and drop the columns that form the hierarchy.
Next, I created a Pivot table in Excel:
You don’t have to create a pivot table if all you are interested is building a Date table in your database. You should already have the data in CSV or Excel that you can import into your relational database.