Showing posts with label sql server denali data quality services. Show all posts
Showing posts with label sql server denali data quality services. Show all posts

Tuesday, June 7, 2011

Data Quality Services (DQS) in Denali

Great introductory session on Data Quality Service by Elad Ziklik in Tech-ed’11. I have highlighted the key learning from this presentation.

Data Quality represents the degree to which the data is suitable for business usage. Data Quality is built through People, Technology and Processes.

With Data Quality Services, both IT and Biz (Data Stewards) work together to create a knowledge driven solution. Data Quality Services relies on this knowledge to clean the data.

Key Concepts

Knowledge Driven – It is based on Data Quality Knowledge Base (DQKB)

Semantics – Data Domains captures the Semantics of your Data

Knowledge Discovery – Acquires additional knowledge the more you use it

Open and Extendible – Supports use of user-generated knowledge and IP by 3rd party reference data  providers

Easy to use – Designed for ease of use and increased productivity

SQL Server Data Quality Services UI:

image

Domain Management: Create domains for your data and assign data quality rules.

image

Account ID length is between 3 and 9.

image

Account ID cannot contain characters.

image

Domain Values: What are the different values that are expected for this attribute? The DQS tool lets you Map, Discover and Manage the Domain values.

image

Map Domain values:

image

Discover Domain Values:

image

Manage Domain Values: Note DQS identified that the domain “Revenue Type” contains a potential issue: Food & Beverage, Food & Beverages.

image

Save the acquired knowledge:

image

After the knowledge is acquired, it is time to clean the data. Build a new Data Quality Project and use the Knowledge base we just created.

image

Map and Cleanse the data:

image

image

Review Corrections:

image

See the corrected records inline and approve or reject the corrections manually. Nice! Smile

image

DQS can suggest corrections:

image

Once the data is corrected, you can export the data to a database or a csv file:

image

Compare the reports pre & post running Data Quality Services:

Pre-DQS:

image

Post-DQS: clean data

image

DQS Architecture:

DQ Server is now part of the SQL Server Denali.You can use the DQ client to cleanse your data or you the new SSIS DQ component to clean your data at the time of ETL. In future we should see Excel and SharePoint support Data Cleansing capabilities added to it.

image

Use 3rd Party Data using a Reference Data Services to build the Knowledge needed to cleanse the data.

image

image

Map 3rd party schema with our schema:

image

Create a new DQS Project and create the mappings:

image

this time the data is sent to the Azure Data Market:

image

DQS comes back with suggested values:

image

SSIS in SQL Server Denali now comes with a DQS transformation which enables data corrections to be run in a batch mode:

image

Create Matching Policies to identify duplicates and create consolidated views:

image

Match the Artists based on the song. If the songs are similar, the artists are probably the same.

image

See the results:

image

See the details on why a Match was suggested:

image

DQS shows you the stats:

image

DQS Creates a Knowledge Base when you create the Matching Policy and run the project once which is saved and used whenever new data comes in.

Note the DQS now knows that B. Spears and Britney Spears is one and the same artist.

image

ShareThis

Meebo

Labels

2012 (1) Adobe Reader (1) Analysis Services (4) Analytic Functions (1) APEX Web Service (1) Apple (1) Aptana (1) Architecture (1) ASP.NET MVC3 (2) Azure (3) Azure Data Market (1) Azure Data Marketplace (4) Best Practices Analyzer (1) Best States in America (US) for Business–CNBC–Excel 2013 Web App Interactive (1) BI for .NET (1) Big Data (2) Bluefish (1) Boise (1) Butternut (1) C# (3) C# Introduction (1) Chow (1) Chrome (1) Chrome Remote Desktop (1) Chromium (1) Cliplets (1) Cloud 9 IDE (1) Cloudon MS Office iPad app (1) Code (1) CodeAcademy (1) CoffeeCup (1) Command (1) communicator (1) Community Edition (1) compiz (1) Contracts (1) Country Region Reference Data (1) Creating presentations using deck.js (1) Crescent (1) Data Explorer (2) Data Mining (2) Data Modeling (1) Data Quality (1) Data Quality Service (1) DataTables (1) Date (1) DAX (1) deck.js (1) Decrypt Files (1) Demo (1) Denali (6) Detect Categories (1) Dropbox (1) Eclipse IDE for Java EE Developers (1) Editor (1) Enter password for keyring 'default' to unlock on Ubuntu (1) Entity Framework 4.0 (1) error (1) ERwin (3) excel (1) Excel 2010 (2) Excel 2013 (1) Excel Services (1) Filezila (1) Format (1) Free (5) free ebooks (2) Free training (2) Fusion Tables (1) Getaround.com (1) Google (2) Google Cloud Connect for Microsoft Office (1) Google Currents (1) Google Docs (1) Google Graph Calculator (1) Hadoop (2) Hadoop for Microsoft Windows (1) Highlight (1) Hive (2) How to embed a Google Book Preview in Blogger? (1) How to embed word (1) HP Envy Spectre (1) HP Z1 Workstation (1) HTML (1) HTML-Kit (1) HTML5 (1) HTML5 animation using Mugeda tool (1) HTML5 Video (1) iBooks (1) Improvement (1) Inspiration (1) Install (1) Install Oracle SH schema (1) Installation (1) Installing (1) Installing Android Apps on Windows using bluestacks (1) Installing EntityFrameWork (1) Integration Services (1) iOS5 (1) iPad (2) iPhone (1) Issue (1) JavaScript (7) jQuery (5) Kinect (1) Knockout.js (1) Komodo Edit (1) Komodo IDE (1) Kompozer (1) Language Change (1) Learn SQL Server 2012 (1) Learn Windows Azure (1) Lego Building Blocks (1) Lifebrowser (1) LightSwitch (2) LINQ Oracle (1) Managing Project (1) mds (1) Menus using CSS3 (1) Microsoft (2) Microsoft BI (6) Microsoft Research (1) Microsoft's Future Vision on Productivity (1) Model first (1) mongodb (1) Motivation (2) Movie (1) MS Office (1) MSDN (1) NASA (1) nautilus-open-terminal (1) Netbeans (1) Netflix (2) Notepad++ (1) NuGet Package Manager (1) OBIEE (4) OData (2) ODP.NET (1) Office (2) Office 2013 (1) Office 2013 Quick Start Guides (1) Office Diagnostics (1) Office Tips and Tricks (1) Office Touch Guide (1) Office365 (2) Omnibox (1) Open ssh (1) ORA Errors (1) ORA-12514 (1) Oracle (1) Oracle 11g Express Edition Options Not Included (1) Oracle 11g Invisible Indexes (1) Oracle 11g New Features (1) Oracle 11g Read Only tables (1) Oracle APEX (2) Oracle Cloud (1) Oracle Data Pump (1) Oracle Express (1) Oracle Express 11g (2) oracle java 7 (1) Oracle Learning Library (1) Oracle OData (1) Oracle Open World 2011 (1) Oracle Pivot and Unpivot (1) Oracle Prebuilt VM Appliances (1) Oracle SQL Developer (1) Oracle Virtual Columns (1) Oracle XML DB (1) Outlook 2010 (3) Performance (1) PerformancePoint Services (2) Personal (1) Phonegap (1) Picture (1) PluralSight (1) Power View (2) PowerPivot (14) powerpoint in a site or a blog (1) PowerShell (1) Prettyprint (1) prezi (1) Project Juneau (1) Python (1) Rainbow (1) Recipe (1) Recorder (1) Rent (1) Reporting Services (2) Resources (1) RIA Services (1) Ruby (1) Samsung Galaxy S III (ATT) with Android 4.1 Jelly Bean (2) Screen Capture (1) Search (2) Security (1) Self Service BI (4) Server Error (1) Seth Godin (1) Setup (1) SharePoint (1) SharePoint 2010 (9) SharePoint 2010 Training (1) SharePoint BI Performance Point (3) Shell Scripting (1) Shutter (1) Silverlight (1) Skydrive (1) SOA (2) Social Network (1) Soup (1) SP2-0750 (1) Spotify (1) SQL Azure (1) SQL Server 2008 R2 (1) SQL Server 2008 R2 Reporting Services (1) SQL Server 2012 (5) sql server denali data quality services (1) SQL Server Virtual Labs (2) SQL*Plus error (1) Squash (1) SSAS Tabular (1) Storing and Retrieving XML data in Oracle 11g (1) Tabular (1) TechEd 2012 (1) TED (1) Terminator (1) Texmaker (1) Thunderstorm (1) TimeTo365.com (1) Tip (1) Tips (1) Tips and Tricks (1) Training (1) Ubuntu (1) Ubuntu 11.04 (1) Ubuntu Oracle (1) unrar (1) Vertica (2) Vertica VM appliance (1) video.js (1) VirtualBox (1) Visio 2010 crashes (1) Visio Services (2) Visual Studio 11 Developer Preview Training Kit (1) Visual Studio 2010 (1) Visual Studio 2010 and .NET Framework 4 Training Kit (1) Visual Studio Express 2012 (1) VLC (1) VM (1) WCF (1) Web Stack (1) WebMatrix (1) Webucator (1) What is new in SQL Server 2012 Analysis Services (1) Whole Foods Boise ID (1) Why won’t Microsoft Lync 2010 start (1) Windows 2008R2 SP1 (1) Windows 7 (1) Windows 8 Charms (1) Windows 8 Consumer Preview Demo (1) Windows 8 Contracts (1) Windows 8 Metro Apps (3) Windows 8 Touch Language (1) Windows 8 Video Driver issue/problem (1) Windows App Store (1) windows azure (3) Windows Phone 7 (1) Windows Phone Development (1) Wine (1) Word 2010 (1) Word 2013 (1) Word PDF Reflow (1) Work (1) zip (1) ZoomIt v4.2 (1)

Disclaimer

This blog does not represent the thoughts, intentions, plans or strategies of my employer. It is solely my opinion. My blog comes with no guarantees, and the content might contain errors. Expect to find a repeat of information that you can find in other blogs and sites. This is mainly for my future reference, It is my way of documenting things. I give due credit to contents, images, information sourced from product demos and other external sources wherever possible.