# Friday, June 04, 2010

If so then my team would like to know as we are looking for another business intelligence consultant.

Full details on our careers web site: SQL / BI Consultant (722749).

You can either post your details directly on the site or send me your CV at [jsnape at microsoft.com] and I’ll enter it for you (disclosure: if you send it via me I will get a bounty when you are hired).

Good luck.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, June 04, 2010 12:18:29 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, February 06, 2010

image

Asking for a running sum in a report is quite a common thing but this week I was asked to create a running sum for a particular customer against number of facts. What I mean here is to create a graph of count vs amount (sort of like a Pareto except in transaction order). So something that looks like graph above.

This is a well rehearsed subject in MDX. You can either use recursion:

With Member [Measures].[Running Sum]
As
    [Internet Sales Order Details].[Sales Order Number].PrevMember
    +
    [Measures].[Internet Gross Profit]

Member [Measures].[Running Count]
As
    [Internet Sales Order Details].[Sales Order Number].PrevMember
    +
    1

Select {
    [Measures].[Running Count],
    [Measures].[Internet Gross Profit],
    [Measures].[Running Sum]
} On 0, Non Empty {
    [Internet Sales Order Details].[Sales Order Number].Children
} On 1
From [Adventure Works]

Where (
    [Customer].[Customer].[Brian Watson]
)

Or, iteration (thanks to Chris Webb for some help on relative performance) which should perform better, especially on AS2008.

With Member [Measures].[Running Sum]
As
    Sum(
        {Null : [Internet Sales Order Details].[Sales Order Number].CurrentMember},
        [Measures].[Internet Gross Profit]
    )
Member [Measures].[Running Count]
As
    Count(
        {Null : [Internet Sales Order Details].[Sales Order Number].CurrentMember}
    )

Select {
    [Measures].[Running Count],
    [Measures].[Internet Gross Profit],
    [Measures].[Running Sum]
} On 0, Non Empty {
    [Internet Sales Order Details].[Sales Order Number].Children
} On 1
From [Adventure Works]

Where (
    [Customer].[Customer].[Brian Watson]
)

[However, on my x64 laptop the second version takes much longer to execute YMMV.]

This is OK for AdventureWorks but my real degenerate dimension has many millions of members and this just doesn’t scale. I contemplated using Reporting Services RunningValue() function but as far as I can tell you can’t use it to generate a category axis.

I needed a way of generating the running count for the x-axis in a way that uses Analysis Services’ excellent aggregation ability.

imageThe solution I ended up with is to create an artificial hierarchy and bucket transactions. That way I can create an attribute relation for aggregation and, importantly, control the number of cells in the iteration.

The next problem was how to assign values to this bucket – some customers had only a few transactions yet others had millions. They all needed to be spread over a fixed set of buckets.

The answer lies in a SQL Server RANK() function:

update dw.Sales
set TradeBucket = x.TradeBucket
from (
    select TradeKey,
    rank() over(partition by CustomerKey order by t.TradeKey asc) /
    case
        when (select COUNT(*) from dw.Sales where CustomerKey = t.CustomerKey) < 1000 then 1
        when (select COUNT(*) from dw.Sales where CustomerKey = t.CustomerKey) < 10000 then 10
        when (select COUNT(*) from dw.Sales where CustomerKey = t.CustomerKey) < 100000 then 100
        when (select COUNT(*) from dw.Sales where CustomerKey = t.CustomerKey) < 1000000 then 1000
        else 10000
    end as TradeBucket
    from dw.Sales
) x
where dw.Sales.TradeKey = x.TradeKey

Effectively, were are generating an incrementing number on a per customer basis and then dividing that number to compress the range. This is surprisingly fast to execute.

Once everything is processed, my new MDX looks like:

With Member [Measures].[Running Sum]
As    
    Sum(
        {Null : [Internet Sales Order Details].[Trade Bucket].CurrentMember},
        [Measures].[Internet Gross Profit]   
    )

Member [Measures].[Running Count]
As
    Sum(
        {Null : [Internet Sales Order Details].[Trade Bucket].CurrentMember},
        [Measures].[Sales Count]   
    )

Select {
    [Measures].[Running Count],
    [Measures].[Internet Gross Profit],
    [Measures].[Running Sum]
} On 0, Non Empty {
    [Internet Sales Order Details].[Trade Bucket].Children
} On 1
From [Adventure Works]

Where (
    [Customer].[Customer].[Brian Watson]
)

It works on aggregated data; there are still around 1000 points which is just fine on the graph and it executes in around 3 seconds. So all good?

Well, for now yes but I can see a problem looming – every time I do an import I update every fact row and fully reprocess the cube. That isn’t going to scale long term. I will probably have to implement some sort of bucket partition strategy.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, February 06, 2010 1:47:11 PM (GMT Standard Time, UTC+00:00)  #    Comments [0] Trackback
# Saturday, November 07, 2009

Let your mind wander

I’ve been an Application Development Consultant since I joined Microsoft in March 2006. Over the 3+ years I’ve had a lot of fun on this team and particularly enjoyed the shear breadth of skills and knowledge required to tackle whatever the customer requires. My CV is now a recruiter’s dream with all the keywords I can check off.

You may have noticed from the posts to this blog over the last year or so that I’ve been spending more time specialising in both development process and our business intelligence products, Analysis Services and PerformancePoint Server. I’ve certainly been finding more and more interesting work in this area; enough to start looking for new challenges.

To that end I’m pleased to say that this week I have accepted a role with the Microsoft Business Solutions team as a Business Intelligence Consultant. The team roughly comprises of SharePoint, Dynamics, BI and other related functions such as architecture and test skills.

There are lots of challenges ahead – in particular I believe to be successful the “business” part of BI must come before the technology. Also I’m really looking forward to getting into data analysis, visualisation and seeing how customer businesses make use of the information I’ll be able to give them.

So watch this space…

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, November 07, 2009 1:58:09 PM (GMT Standard Time, UTC+00:00)  #    Comments [1] Trackback
# Sunday, September 13, 2009
Sunset Smile
Sunset Smile, uploaded to Flickr by James Snape.

This is the first in a series I took at Hengistbury Head this weekend. This one is looking back to Bournemouth. Of the set, it's my least favourite because the sun was still too high in the sky and it's blown out a little.

I got to use my new Lee filters for the first time which has helped the shot and I can see why other's photos taken with ND grads have this nice pink tint to them.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, September 13, 2009 3:42:34 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, June 30, 2009

ssisdataflowsample Whilst doing some design work today for a customer project I realised there are a set of principals I try and adhere to when creating SQL Server Integration Services packages. The list is no doubt incomplete but this is what I have so far.

Minimise IO

This is a general data processing principal. Usually disk and, to a lesser extent, network performance determine the overall processing speed. Reducing the amount of IO in a solution will therefore increase performance.

Solutions that consist of multiple read-process-write steps should be redesigned into a single read-process-process-process-write step.

Prefer Sequential IO to Random IO

Disks perform at their best when sequentially reading or writing large chunks of data. Random IO (and poor performance) manifests when procedural style programming occurs - signs to look out for are SQL statements modifying/returning only few rows but being executed repeatedly.

Watch out for hidden random IO - for example, if you are reading from one table and writing to another in a sequential manor then disk access will still be random if both tables are stored on the same spindles.

Avoid data flow components that pool data

Data flow components work on batches of data called buffers. In most instances buffers are modified in place and passed down stream. Some components, such as "Sort" cannot process data like this and effectively hang on to buffers until the entire data stream is in memory (or spooled to disk in low memory situations). This increased memory pressure will affect performance.

Sometimes SQL is the better solution

Whilst the SSIS data flow has lots of useful and flexible components, it is sometimes more efficient to perform the equivalent processing in a SQL batch. SQL Server is extremely good at sorting, grouping and data manipulation (insert, update, delete) so it is unlikely you will match it for raw performance on a single read-process-write step.

SSIS does not handle hierarchical data well

Integration Services is a tabular data processing system. Buffers are tabular and the components and associated APIs are tabular. Consequently it is difficult to process hierarchical data such as the contents of an XML document. There is an XML source component but it's output is a collection of tabular data streams that need to joined to make sense.

Execute SSIS close to where you wish to write your data

Reading data is relatively easy and possible from a wide variety of locations. Writing data, on the other hand, can involve complex locking and other issues which are difficult to optimise on a network protocol. In particular when writing data to a local SQL Server instance, SSIS automatically used the Shared Memory transport for direct inter-process transfer.

Don't mess with the data flow metadata at runtime

It's very difficult to do this anyway but worth mentioning that SSIS gets it's stellar performance from being able to setup a data flow at runtime safe in the knowledge that buffers are of a fixed format and component dependencies will not change.

The only time this is acceptable is when you need to build a custom data flow programmatically. You should use the SSIS API's and not attempt to write the package XML directly.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, June 30, 2009 7:23:10 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, June 23, 2009
Plaza Mayor World
Plaza Mayor World, uploaded to Flickr by James Snape.

Canon 400D, Canon EF-S 10-22mm f/3.5-4.5 lens - 1/125 sec, f/9, ISO 100

This is effectively a panorama shot wrapped around to look like a mini planet. I almost have it perfected but one of the things you must do is ensure there are no objects in the top or bottom of frame when you shoot them. The lamp post and tower just managed to fall in the zone when wrapped.

It was taken in Madrid two weeks ago when Claire and I celebrated our one year anniversary.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, June 23, 2009 3:39:48 PM (GMT Daylight Time, UTC+01:00)  #    Comments [2] Trackback
# Wednesday, June 10, 2009

There are a lot of new features coming in the next release of Visual Studio and related products. They are easily discoverable but I thought it would be helpful to bring as many videos and screen casts as I could find into a single list for easy consumption.

A word of warning though, I can’t be held responsible for the feelings brought on by the realisation that you can’t use any of this stuff in a supported way right now. Just try to relax and think how much better, faster and easier your job will be when it is finally released.

General

Microsoft Visual Studio Team System: A Lap Around VSTS 2010

In the spirit of an agile sprint, see how to use the next version of Visual Studio Team System to manage user stories and re-factor existing architecture. Learn how to diagnose real production problems, debug in-production virtual labs, capture test data to eliminate the no-repro bugs, transparently plan, monitor, and adapt software projects.

Team Foundation Server 2010: Cool New Features

Dive deep into the next version of Team Foundation Server (TFS), and learn how TFS has factored its learning's about usability, industrial scale, geographic distribution, manageability, and development process into the next version of the product. See a demonstration of build automation, policy checks, parallel development, new project planning and tracking features, such as agile planning, end to end traceability, reporting, and dashboards, administration and ops --all designed to improve transparency and velocity for teams from size 5 through 50,000.

A first look at Visual Studio Team System Web Access 2010

Visual Studio Team System Web Access has become an increasingly popular way for people to access Team Foundation Server. In this interview we meet Hakan Eskici who demonstrates some of the upcoming features his team is working on for Visual Studio Team System Web Access 2010.

Project Management

Agile Planning Templates in Visual Studio Team System 2010

Stephanie Saad shows us a quick demonstration of how Visual Studio Team System 2010 will enable teams to be more agile. In this demonstration she shows the new Agile planning worksheet for Excel which can be used to easily balance resources, manage your backlog, and generate ad hoc reports.

Enterprise Project Management with Visual Studio Team System 2010

Ameya Bhatawdekar, a program manager for Team Foundation Server, took a few minutes to take us through the end-to-end storyboards for how Team Foundation Server 2010 will integrate with Microsoft Project Server to enable true enterprise-wide collaboration. Note that this is not a demo of working software (yet), but it's the next best thing - a detailed storyboard walkthrough of mocked-up screenshots.

Requirements

Requirements Management and Traceability with Visual Studio Team System 2010

How can you ensure that a requirement has been sufficiently tested? How do you track the work that goes into a specific feature? How much work is left to do before a feature is completed, and how does that feature relate to bigger scenarios or user stories?
Siddharth Bhatia, a senior group program manager for Visual Studio Team System, takes us through an end-to-end example of how Visual Studio Team System 2010 will help an organization manage their requirements throughout the lifecycle of a software project.

Architecture

Architecture without Big Design Up Front

Microsoft Visual Studio Team System (VSTS), code-name "Rosario" Architecture Edition, introduces new UML designers, use cases, activity diagrams, sequence diagrams that can visualize existing code, layering to enforce dependency rules, and physical designers to visualize, analyze, and refactor your software. See how VSTS extends UML logical views into physical views of your code. Learn how to create relationships from these views to work items and project metrics, how to extend these designers, and how to programmatically transform models into patterns for other domains and disciplines.

"Bottom-up" Design with Visual Studio Team System 2010 Architect

Suhail Dutta, a program manager on the Visual Studio Team System Architect team, gives us a demonstration of the "bottom-up" design approach which will be possible with the Visual Studio Team System 2010 Architect product.
With "bottom-up" design, you can quickly reverse engineer an existing code base to construct models and examine relationships between pieces of code. Suhail also shows off some of the new UML designers coming in Visual Studio Team System 2010.

"Top-down" design with Visual Studio Team System 2010

"Top-down" design is an approach that the Visual Studio Architect team is enabling with their upcoming release, Visual Studio Team System 2010. In this "humanized screencast" we asked Mark Groves, senior program manager, to show us a demonstration of the new UML designers the team is building and how this can be applied to a "top-down" approach when building software.

Development

Agile Development with Microsoft Visual Studio

Visual Studio has built-in tool support for agile practices such as Scrum, XP, and others. The next version adds practices like test-driven development, continuous integration, and single product backlog. See how these can be applied at scale and across geographies.

Web Development and Deployment with Visual Studio 2010

Welcome back to another Visual Studio 2010 and .NET Framework 4.0 Week video. In this latest installment, we catch up with Vishal Joshi, Senior Program Manager on the Web Development Tools team.  In this video, Vishal shows us what is being done in Visual Studio 2010 around web development and deployment. Covered are topics like JQuery support, HTML code snippets, better Intellisense, and a whole slew of new features around web deployment.

An early look at Team Foundation Build 2010 with Jim Lamb

In addition to being one of the nicest guys I know, Jim Lamb also knows a thing or two about build automation. Jim is the program manager responsible for the Team Build capability of Team Foundation Server. Team Build was one of the biggest areas of improvement for Team Foundation Server 2008, but that hasn't stopped the team from doing even more landmark improvments in Team Foundation Server 2010.
Jim shows off how Team Build 2010 will take advantage of Windows Workflow, build agent pooling, distributed asynchronized builds, and two new types of build called "buddy builds" and gated check-ins.

Branching and Merging Visualization with Team Foundation Server 2010

Is your source control branching out of control? How much time have you wasted trying to discover which branches your code changes have been merged into? What are the code-level differences between your main, test, and production branches? Branch visualizations to the rescue!

Test

New Web Test Debugging Features in Visual Studio Team System 2010

In this video Ed Glas shows off new Web test debugging features in Visual Studio Team System 2010, including Search in playback, view recording log, jump to Web test, and Add Extraction Rule from Playback.

10-4 Episode 18: Functional UI Testing

In this episode of 10-4 we look at a new type of test coming in Visual Studio Team System 2010 known as the coded UI test. Coded UI tests can be created to automatically navigate through your application's UI, which in turn can be used to verify that the paths your users might take through your application are working properly. You can also add validation logic along the way to verify the properties of objects within the UI. Much like unit tests can quickly surface regressions on a method or function level, coded UI tests can bring the same level of rapid automated testing capabilities to the UI layer.

UI Automation Testing with Visual Studio 2010

Just playing with some of the new Testing features in Visual Studio 2010 and thought people might be interested in the new interface for Camano and a new feature for CodedUI Tests...pulling the automations strips directly out of TFS!

Lab Management coming to Visual Studio Team System 2010

Today at TechEd Barcelona, Jason Zander announced that Visual Studio Team System 2010 will feature a brand new Lab Management capability to help organizations raise the bar on software quality. Lab Management will integrate with the rest of the Visual Studio to help testers more easily test a variety of configurations in a virtual lab environment, and help developers more easily repro bugs by delivering snapshots of those virtualized environments after bugs are discovered. I had a chance to sit down with Ram Cherala and Vinod Malhotra to get an in-depth look at how this will work.

Microsoft Visual Studio Team System: Leveraging Virtualization to Improve Code Quality with Team Lab

Would you like to test fixes in a production-like environment before checking them in to source control? The Visual Studio Team System (code name "Rosario") release of Team Lab improves productivity and quality while reducing the cost of building and testing world class products. Learn how Team Lab provides a fast and easy way to create a test environment and tear it down, target specific test environments, and take snapshots of an environment for easy deployment.

Microsoft Visual Studio Team System: Software Diagnostics and Quality for Services

In this session we present processes and tools from the upcoming Visual Studio Team System code name "Rosario" release and Microsoft Research and show how we deliver on quality, scalability, and experience goals for the new class of applications that demand rich UI, service consumption, and frequent release.

Manual Testing with Visual Studio Team System 2010

Naysawn Naderi takes us through manual testing in Visual Studio Team System 2010. Naysawn shows off how the manual testing capabilities allow not only for better authoring and execution of manual tests, but can also be a tool to help automate portions of manual tests as well. Finally, Naysawn shows how to turn a manual test into a coded test which can then be fully automated.

Historical Debugger and Test Impact Analysis in Visual Studio Team System 2010

Are you tired of constantly setting breakpoints to hone in on a pesky bug? How would you like to be able to step "back in time" through your debugger? The Historical Debugger in Visual Studio Team System 2010 promises to revolutionize your debugging experience. Habib Heydarian takes us through a demonstration of just a few of its capabilities.
But wait... there's more! Habib also shows us the new Test Impact Analysis feature his team is working on. With Test Impact Analysis it's possible to determine which of your tests will be... well... impacted by the code changes you're making! Not only does this mean that your unit test suite can run more quickly, but it can also lead to better testing and fewer bugs in software projects.

Automated User Interface (UI) Testing with Microsoft Visual Studio Team System 2010

Come hear about the new Visual Studio Team System 2010 tools and APIs for helping test a broad range of UIs that can consist of Winforms, AJAX, and Windows Presentation Foundation. See how to use Team System 2010 to ensure UI and application quality.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Wednesday, June 10, 2009 10:17:52 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Friday, June 05, 2009

RJR Consulting design thumbnail Over the past few weeks I’ve been working on a web site for my cousin. This was a ground up redesign of his company web to replace the old SEO unfriendly Flash site.

Normally as the developer I only get someone else's completed design; I just do the implementation but I wanted to see if any of my recent photography and image editing skills were transferrable.

The site was first designed in Illustrator/Photoshop with a fair number of round trips to agree the final design. Implementation was done in Expression Web using HTML, CSS and some JQuery for interactivity. There are no <tables> anywhere on the site; it is a pure CSS layout. I also used Expression Web "Dynamic Web Templates" for the master page layout as the server where it is currently hosted does not support .NET.

Some things I've learned from doing this:

  • JQuery makes life easy and Glimmer makes it even easier
  • IE8 standards support is fantastic but way to many still use IE6/7 to use it
  • You need to have an idea of what is possible in HTML before letting loose in Photoshop
  • background-image is your friend

As it is all pretty static at the moment there are plans to move to a site that supports .NET so I can add some more features, in particular a blog and news feed. I would like to try this in ASP.MVC to get some experience there.

So go to RJR Consulting and have a look around (especially if you need telemarketing, telesales or similar services).

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Friday, June 05, 2009 7:01:19 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Sunday, May 24, 2009
I'm not a morning bee
I'm not a morning bee, uploaded to Flickr by James Snape.

Canon 400D, Canon EF 100mm f/2.8 Macro lens - 1/2 sec, f/11, ISO 200

I took a couple of macro shots a few weeks ago but all the insect images went in the bin as they were out of focus or blurred. This bee was still in the process of warming up for the day so didn't move about much.

My only issue with the image is that the conversion to Jpeg hasn't gone that well - in particular it looks over sharpened and the colour of the catch light has gone quite blue compared to the original Photoshop image. I may end up reprocessing it to see if I can fix the flaws.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Sunday, May 24, 2009 12:28:08 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Saturday, April 25, 2009
Talbot Heath Fire
Talbot Heath Fire, uploaded to Flickr by James Snape.

Canon 400D, Canon EF 24-105mm f/4L IS lens - 1/250 sec, f/5.0, ISO 100

We had a rather large fire on the heath behind our house last week. It Bournemouth Echo article as it's the second time in a month there was a fire here. Unfortunately there is not much left now which is a shame because it's a rare bird breeding ground.

The Watcher, Plate 3I should have taken my long lens; this shot was about as close as I could get due to a police line around the fire.

I don't think they liked the fact I was taking photos though as it got my name and address in their notebook.

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Saturday, April 25, 2009 10:58:54 AM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Tuesday, April 21, 2009

If you are a developer and tasked with making sure the corporate rollout of Internet Explorer 8 goes without issues, you will find the following links of use.

The main page of interest to you is the IE Developer Centre where you will find all the information and links collected together.

Also:

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Tuesday, April 21, 2009 3:11:37 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
# Thursday, April 16, 2009

Every data warehouse needs a date dimension and at some point it needs to be populated. Most use some sort of a SQL script that loops though the dates and add rows to the destination table but this is pretty slow to execute. You might even try cross joining a year, month and day temporary tables to produce a set based solution but don’t forget to filter out the illegal days.

I prefer to fill my date tables by generating the correct stream of values from a SQL Server Integration Services script source component. This has a number of benefits:

  • It executes very quickly
  • The data can be bulk loaded
  • CultureInfo supplies the correct translations of day and month names
  • It is easy to add custom columns such as fiscal years and quarters

I haven’t wrapped this in a pre-compiled component as it is so easy to do in script from. Also, I haven’t got around to generalizing the fiscal date offsets for different companies so they usually have to be custom coded.

Script Component Type Dialog

First drop a “Script Component” onto your Data Flow.

Select “Source” as the Script Component Type and click OK.

Then double-click the newly added component to edit the properties.

Note that you need to add the correct output columns before adding the script or else it won’t compile.


Script Source Outputs

I’ve renamed the output here to “Dates” to help further down the Data Flow.

Click the “Add Column” button to add new columns as show here. Note that I’ve also changed the data type of each column to match my source table. It required casts in script but it’s easier than conversions in the data pipeline.

Finally go back to the script part of the dialog and click the “Edit Script” button to launch Visual Studio for Applications.

In the resulting window, add your code to generate the date stream to the CreateNewOutputRows() function.
The general form is of:

var output = this.DatesBuffer;  // Get the output buffer

while (/*loop though your dates*?)
{

output.AddRow();

// Set the various column values e.g.
output.CalendarYear = date.Year

// Increment the date
date = date.AddDays(1);
}

The full script is in the attached sample package where I’ve also added a script destination that does nothing with the data. Attach a data viewer to see what output is generated.

Date Data

From here you can manipulate the data, and pipe it to your dimension table from within the pipeline.

DateSourceSample.zip (27.08 KB)

This posting is provided "AS IS" with no warranties, and confers no rights.
posted on Thursday, April 16, 2009 7:52:34 PM (GMT Daylight Time, UTC+01:00)  #    Comments [0] Trackback
DateSourceSample.zip (27.08 KB)