Showing posts with label Data Mining. Show all posts
Showing posts with label Data Mining. Show all posts

Monday, April 2, 2012

Microsoft BI – Detect Categories using SQL Server 2012 Data Mining Add-In for Excel 2010

In this post I am going to look at the ‘Detect Categories’  table analysis tool using SQL Server 2012 Data Mining Add-In for Excel 2010. The ‘Detect Categories’ tool uses Microsoft’s clustering algorithm to automatically categorize rows in a table that have similar characteristics. This is the simplest way to categorize data in a table and understand its characteristics.

I am going use a data set from the Windows Azure Marketplace called ‘Places of Interests in Singapore’.

image

The file downloaded as a ‘.csv’ and contains about 26,000 rows. For this example, I am going to run the analysis on a subset of this data by creating a sample set.

image

Here are the steps to create a sample. Click on the Data Mining table and choose ‘Sample Data’

image

Take a random sampling:

image

I’ll sample ~30%:

image

Click on ‘Detect Categories’ from under ‘Table Tools’>’Analyze’ menu:

image

In the column list, I’ll choose: company_type, category, status, food_cuisine. Set the number of categories to 4 and Run the analysis. A rule of thumb I use to select columns is choose those columns that have low distinct values (low cardinality). You want to skip columns such as names, addresses, dates etc that have large distinct values.

This creates a categories report:

image

The first category created is for Shopping based on the record count, so I am going to rename ‘Category 1’ to ‘Shopping’. This updates the table below and the Chart.

image

The 2nd category is “Food and Beverage”:

image

The 3rd category is a hodgepodge of different categories so I’ll call it Other and the 4th category is called ‘Beauty and Spa’.

To refresh the chart, please Ctl-Alt-F5

image

The new Category column is also added to the table so you can now sort or filter by category:

image

Saturday, March 31, 2012

Microsoft BI – Data Mining – Analyze Key Influencers

In this post, I am going to look at how Analyze Key Influencers works within the Data Mining 2012 Addin for Excel 2010. Analyze Key Influencers uses Microsoft’s Clustering Data Mining algorithm to analyze a column that contains the desired outcome of results. The algorithm looks at the patterns in the data to determine which of the factors had the strongest influence on that outcome.

For example you could answer:

What are the common characteristics  of the customers who are our top purchasers?

What are those key influencing/decision-making factors?

What’s really influencing purchasing? Is it the price? Is it the quality of products? Is it service?

I will use the sample data that came with the Data Mining 2012 Addin. This file contains data sets suitable for experimenting with the Data Mining Add-Ins. The data is extracted from the AdventureWorks sample database.

image

Here is a snapshot of the raw data:

image

Note the last column “Purchased Bike” is what we’re going to analyze. This column only contains “Yes” and “No” values.

Under Table Tools, select “Analyze Key Influencers”

image

This will launch the wizard and choose “Purchased Bike” column.

image

Click on Choose Columns to be used for analysis otherwise the analysis will be run on all columns.

image

I will choose: Marital Status, Gender, Cars and Region and Run the analysis. If you streamline your analysis, we can improve the performance and accuracy of this report.

image

A new report called “Key Influencers Reports for ‘Purchased Bike’” is created. The report graphically displays the probability of the relationship. The results are listed only for the most influential columns and columns that did not influence the outcome is excluded from the report. In our example Gender is excluded even though I had chosen it for analysis.

image

The result above tells us:

The most influential factor that favored buying a bike is:

  1. customers did not have Cars
  2. followed by those customers who were single
  3. followed by those who had 1 car
  4. followed by those who lived in the Pacific

The most influential factor that did NOT favor buying a bike is:

  1. customers had 2 cars
  2. followed by those customers who were married
  3. followed by those who had 1 car
  4. followed by those who lived in the North America

In more complex examples, we could potentially analyze the outcome of a column that had many values.

ShareThis

Meebo

Labels

2012 (1) Adobe Reader (1) Analysis Services (4) Analytic Functions (1) APEX Web Service (1) Apple (1) Aptana (1) Architecture (1) ASP.NET MVC3 (2) Azure (3) Azure Data Market (1) Azure Data Marketplace (4) Best Practices Analyzer (1) Best States in America (US) for Business–CNBC–Excel 2013 Web App Interactive (1) BI for .NET (1) Big Data (2) Bluefish (1) Boise (1) Butternut (1) C# (3) C# Introduction (1) Chow (1) Chrome (1) Chrome Remote Desktop (1) Chromium (1) Cliplets (1) Cloud 9 IDE (1) Cloudon MS Office iPad app (1) Code (1) CodeAcademy (1) CoffeeCup (1) Command (1) communicator (1) Community Edition (1) compiz (1) Contracts (1) Country Region Reference Data (1) Creating presentations using deck.js (1) Crescent (1) Data Explorer (2) Data Mining (2) Data Modeling (1) Data Quality (1) Data Quality Service (1) DataTables (1) Date (1) DAX (1) deck.js (1) Decrypt Files (1) Demo (1) Denali (6) Detect Categories (1) Dropbox (1) Eclipse IDE for Java EE Developers (1) Editor (1) Enter password for keyring 'default' to unlock on Ubuntu (1) Entity Framework 4.0 (1) error (1) ERwin (3) excel (1) Excel 2010 (2) Excel 2013 (1) Excel Services (1) Filezila (1) Format (1) Free (5) free ebooks (2) Free training (2) Fusion Tables (1) Getaround.com (1) Google (2) Google Cloud Connect for Microsoft Office (1) Google Currents (1) Google Docs (1) Google Graph Calculator (1) Hadoop (2) Hadoop for Microsoft Windows (1) Highlight (1) Hive (2) How to embed a Google Book Preview in Blogger? (1) How to embed word (1) HP Envy Spectre (1) HP Z1 Workstation (1) HTML (1) HTML-Kit (1) HTML5 (1) HTML5 animation using Mugeda tool (1) HTML5 Video (1) iBooks (1) Improvement (1) Inspiration (1) Install (1) Install Oracle SH schema (1) Installation (1) Installing (1) Installing Android Apps on Windows using bluestacks (1) Installing EntityFrameWork (1) Integration Services (1) iOS5 (1) iPad (2) iPhone (1) Issue (1) JavaScript (7) jQuery (5) Kinect (1) Knockout.js (1) Komodo Edit (1) Komodo IDE (1) Kompozer (1) Language Change (1) Learn SQL Server 2012 (1) Learn Windows Azure (1) Lego Building Blocks (1) Lifebrowser (1) LightSwitch (2) LINQ Oracle (1) Managing Project (1) mds (1) Menus using CSS3 (1) Microsoft (2) Microsoft BI (6) Microsoft Research (1) Microsoft's Future Vision on Productivity (1) Model first (1) mongodb (1) Motivation (2) Movie (1) MS Office (1) MSDN (1) NASA (1) nautilus-open-terminal (1) Netbeans (1) Netflix (2) Notepad++ (1) NuGet Package Manager (1) OBIEE (4) OData (2) ODP.NET (1) Office (2) Office 2013 (1) Office 2013 Quick Start Guides (1) Office Diagnostics (1) Office Tips and Tricks (1) Office Touch Guide (1) Office365 (2) Omnibox (1) Open ssh (1) ORA Errors (1) ORA-12514 (1) Oracle (1) Oracle 11g Express Edition Options Not Included (1) Oracle 11g Invisible Indexes (1) Oracle 11g New Features (1) Oracle 11g Read Only tables (1) Oracle APEX (2) Oracle Cloud (1) Oracle Data Pump (1) Oracle Express (1) Oracle Express 11g (2) oracle java 7 (1) Oracle Learning Library (1) Oracle OData (1) Oracle Open World 2011 (1) Oracle Pivot and Unpivot (1) Oracle Prebuilt VM Appliances (1) Oracle SQL Developer (1) Oracle Virtual Columns (1) Oracle XML DB (1) Outlook 2010 (3) Performance (1) PerformancePoint Services (2) Personal (1) Phonegap (1) Picture (1) PluralSight (1) Power View (2) PowerPivot (14) powerpoint in a site or a blog (1) PowerShell (1) Prettyprint (1) prezi (1) Project Juneau (1) Python (1) Rainbow (1) Recipe (1) Recorder (1) Rent (1) Reporting Services (2) Resources (1) RIA Services (1) Ruby (1) Samsung Galaxy S III (ATT) with Android 4.1 Jelly Bean (2) Screen Capture (1) Search (2) Security (1) Self Service BI (4) Server Error (1) Seth Godin (1) Setup (1) SharePoint (1) SharePoint 2010 (9) SharePoint 2010 Training (1) SharePoint BI Performance Point (3) Shell Scripting (1) Shutter (1) Silverlight (1) Skydrive (1) SOA (2) Social Network (1) Soup (1) SP2-0750 (1) Spotify (1) SQL Azure (1) SQL Server 2008 R2 (1) SQL Server 2008 R2 Reporting Services (1) SQL Server 2012 (5) sql server denali data quality services (1) SQL Server Virtual Labs (2) SQL*Plus error (1) Squash (1) SSAS Tabular (1) Storing and Retrieving XML data in Oracle 11g (1) Tabular (1) TechEd 2012 (1) TED (1) Terminator (1) Texmaker (1) Thunderstorm (1) TimeTo365.com (1) Tip (1) Tips (1) Tips and Tricks (1) Training (1) Ubuntu (1) Ubuntu 11.04 (1) Ubuntu Oracle (1) unrar (1) Vertica (2) Vertica VM appliance (1) video.js (1) VirtualBox (1) Visio 2010 crashes (1) Visio Services (2) Visual Studio 11 Developer Preview Training Kit (1) Visual Studio 2010 (1) Visual Studio 2010 and .NET Framework 4 Training Kit (1) Visual Studio Express 2012 (1) VLC (1) VM (1) WCF (1) Web Stack (1) WebMatrix (1) Webucator (1) What is new in SQL Server 2012 Analysis Services (1) Whole Foods Boise ID (1) Why won’t Microsoft Lync 2010 start (1) Windows 2008R2 SP1 (1) Windows 7 (1) Windows 8 Charms (1) Windows 8 Consumer Preview Demo (1) Windows 8 Contracts (1) Windows 8 Metro Apps (3) Windows 8 Touch Language (1) Windows 8 Video Driver issue/problem (1) Windows App Store (1) windows azure (3) Windows Phone 7 (1) Windows Phone Development (1) Wine (1) Word 2010 (1) Word 2013 (1) Word PDF Reflow (1) Work (1) zip (1) ZoomIt v4.2 (1)

Disclaimer

This blog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. My blog comes with no guarantees, and the content might contain errors. Expect to find a repeat of information that you can find in other blogs and sites. This is mainly for my future reference, It is my way of documenting things. I give due credit to contents, images, information sourced from product demos and other external sources wherever possible.