Showing posts with label sql server denali data quality services. Show all posts
Showing posts with label sql server denali data quality services. Show all posts

Tuesday, June 7, 2011

Data Quality Services (DQS) in Denali

Great introductory session on Data Quality Service by Elad Ziklik in Tech-ed’11. I have highlighted the key learning from this presentation.

Data Quality represents the degree to which the data is suitable for business usage. Data Quality is built through People, Technology and Processes.

With Data Quality Services, both IT and Biz (Data Stewards) work together to create a knowledge driven solution. Data Quality Services relies on this knowledge to clean the data.

Key Concepts

Knowledge Driven – It is based on Data Quality Knowledge Base (DQKB)

Semantics – Data Domains captures the Semantics of your Data

Knowledge Discovery – Acquires additional knowledge the more you use it

Open and Extendible – Supports use of user-generated knowledge and IP by 3rd party reference data  providers

Easy to use – Designed for ease of use and increased productivity

SQL Server Data Quality Services UI:

image

Domain Management: Create domains for your data and assign data quality rules.

image

Account ID length is between 3 and 9.

image

Account ID cannot contain characters.

image

Domain Values: What are the different values that are expected for this attribute? The DQS tool lets you Map, Discover and Manage the Domain values.

image

Map Domain values:

image

Discover Domain Values:

image

Manage Domain Values: Note DQS identified that the domain “Revenue Type” contains a potential issue: Food & Beverage, Food & Beverages.

image

Save the acquired knowledge:

image

After the knowledge is acquired, it is time to clean the data. Build a new Data Quality Project and use the Knowledge base we just created.

image

Map and Cleanse the data:

image

image

Review Corrections:

image

See the corrected records inline and approve or reject the corrections manually. Nice! Smile

image

DQS can suggest corrections:

image

Once the data is corrected, you can export the data to a database or a csv file:

image

Compare the reports pre & post running Data Quality Services:

Pre-DQS:

image

Post-DQS: clean data

image

DQS Architecture:

DQ Server is now part of the SQL Server Denali.You can use the DQ client to cleanse your data or you the new SSIS DQ component to clean your data at the time of ETL. In future we should see Excel and SharePoint support Data Cleansing capabilities added to it.

image

Use 3rd Party Data using a Reference Data Services to build the Knowledge needed to cleanse the data.

image

image

Map 3rd party schema with our schema:

image

Create a new DQS Project and create the mappings:

image

this time the data is sent to the Azure Data Market:

image

DQS comes back with suggested values:

image

SSIS in SQL Server Denali now comes with a DQS transformation which enables data corrections to be run in a batch mode:

image

Create Matching Policies to identify duplicates and create consolidated views:

image

Match the Artists based on the song. If the songs are similar, the artists are probably the same.

image

See the results:

image

See the details on why a Match was suggested:

image

DQS shows you the stats:

image

DQS Creates a Knowledge Base when you create the Matching Policy and run the project once which is saved and used whenever new data comes in.

Note the DQS now knows that B. Spears and Britney Spears is one and the same artist.

image

Udemy