I recently had a deployment issue where a push of new code caused a worker role to continually restart – everything worked locally, but the thing just wouldn’t stay up in the cloud.
A cleaner event log for diagnosing role-start issues
The ability to remote into an instance is invaluable for diagnosing this sort of thing, especially when your instance is falling down before it even runs your start-up code. In my instance, the Application event log was filling up with error entries at a rate of 4 a second, all tied back to the Windows Azure Caching Client installer. That didn’t make any sense – the thing hadn’t changed for months. With so many log entries it was hard to tell what was happening.
However, the Windows Azure event log under Applications and Services Logs was much more helpful. It seemed that the role was restarting due to a version conflict of the New Relic monitoring agent – nothing to do with the Caching Client installer. Perhaps the caching client installer was being kicked off by the role starting, and so by terminating it was killing the child process leading to normal Application log entries?
Regardless – it set me down the right path of fixing the dodgy reference and redeploying – making the instance stable at the same time.
I recently wanted to install the Windows Azure SDK 2.0 on my development laptop, but the usually-pretty-good Web Platform Installer was having none of it:
Well… that’s unhelpful
Luckily WPI gives us a log file during installation that might help. Taking a quick shufty, we find where the problem is pretty quickly:
DownloadManager Information: 0 : Starting EXE command for product 'Windows Azure Emulator - 2.0'. Commandline is: '[removed for brevity]\WindowsAzureEmulator-x64.exe /quiet /norestart /msicl RUNDSINIT=1 /log C:\Users\<Username>\AppData\Local\Temp;C:\Qt\Qt5.0.2\5.0.2\msvc2012_64\bin\WindowsAzureEmulator_2_0.txt'. Process Id: 5264
DownloadManager Information: 0 : Install exit code for product 'Windows Azure Emulator - 2.0' is '1622'
So – the process is exiting with code 1622, but hold on – what the hell’s that log path? It starts looking like a reference to the temp directory, but with the Qt install path involved? Let’s look at what %TEMP%’s set to:
As they say – there’s your problem. Removing Qt from the %TEMP% environment variable and we’re off.
While I’m not sure if I’m going to re-run the Whisky Fringe Tasting Tracker from last year, I saw heatmap.js for the first time the other day and thought it’d be fun to make a Mansfield Traquair heatmap showing dram-sampling by stand. Here’s the result:
The 675 samplings recorded by www.wf2012.co.uk over the 2012 Whisky Fringe
Not bad for a first attempt. That’s 675 samplings tracked by stand – of course, some stands had appreciably more drams to sample than others but there were definite hotspots. Given that we have opinion data too, we can also plot the hotspots of most-liked drams:
Positive opinions recorded at each stand during the 2012 Whisky Fringe – broadly similar but with some interesting detail
If I do run it again this year it’d be great to get heatmap.js combined with the above floorplan image and Pusher for some real-time updates…
Sometimes you come across relatively simple operations that work locally but fail in the cloud – notably, anything that involves image manipulation using the System.Drawing namespace is unlikely to work.
When faced with this sort of thing, WPF immediately springs to mind and luckily this post from Dr WPF gives a sample that works both locally and in the cloud – jackpot!
New caching doesn’t seem to be a productivity improvement
I’ve experience the above plenty of times – hit F5, wait for your projects to build then sit back and enjoy anywhere from a minute to five minutes of peaceful reflection while the Azure emulator gets up to speed.
After some playing with procmon I discovered that the vast majority of activity in those interminable minutes could be put into three broad buckets:
- Logging to DFAgent.log, sometimes tens of times a second (something I’m seriously considering symlinking to NUL, or at the very least a ramdisk)
- Various things to do with installing the Windows Azure Cache Emulator
- Lots of communications with ‘something cloudy’ at 168.63.0.*
Troubles seemed to start with the install of the 1.8 SDK, but in fact appear now to be linked to whether or not ‘Enable Caching’ is turned on in the cloud role properties:
Some timings from my machine on a particularly large solution, from ‘Starting emulator’ to a usable deployment:
- Enable Caching ‘On’: ~3 minutes average
- Enable Caching ‘Off’: ~15 seconds average
This is a bugger though, as you can’t have cloud-configuration-specific settings for your caching – it’s either globally on or globally off. In addition, without caching enabled anything that attempts to use it will fail with an exception (fine, if those classes fall back to a null cache implementation). So – for my purposes, my local development environment setup changes from:
- Caching enabled
- Use Azure caching for session state management
to a new cloud project specifically for development with:
- Caching disabled
- Use the Session State service for session state management (configured in web.config) and web.config transforms to convert to Azure Caching session state management for deployment
And my F5 experience goes from three to five minutes down to 15-20 seconds. Not ideal, as I’ve taken my development environment another step further away from my deployment environment, but from a productivity point of view it’s a no-brainer.
Have yet to diagnose why installing the Windows Azure Cloud Cache components as happens during a development run is so costly – more investigation required…
The Windows Azure SDK’s pretty good, but implementation decisions in sections of it can make extending or monitoring what it’s up to tricky. Recently I wanted just to track the number of reads and writes that were taking place on a set of tables per request to identify opportunities for caching.
Given I was using a repository pattern, I was able to proxy the CloudTable object upon which queries are executed. While the proxy doesn’t inherit from CloudTable (not least because it’s sealed), with only a couple of lines replaced all code that called methods upon an instance of CloudTable were quickly calling the equivalent methods on my proxy. It’s not perfect, but now that it’s in place I can do a number of fun things like track table storage operation counts, or implement access control per table.
To save anyone else having to write it, it’s presented below:
While trying to diagnose a time-zone issue in an Azure application, I lazily changed my system clock to be an hour earlier instead of actually changing my system timezone. I then found I couldn’t start my app in the emulator:
CacheInstaller.exe has stopped working? Check your system time
If your system clock is out by more than about 10 minutes and you’re trying to use the Azure cache for development purposes then you’re going to have some trouble.
Remember – having your system time right doesn’t just mean setting it up to look right on the clock in the taskbar, you also need to be in the right timezone so that calculations to and from UTC are performed correctly.
Having lost a day to this, I post in the hope someone else gets their projects fixed sooner. Caveat: No idea which step fixed the issue.
- Large solution (~40 projects) containing multiple Azure services, all targeting Azure SDK 1.7
- Another team member upgrades to Azure SDK 1.8
- Now can’t deploy a working build to Azure as once up, site 500s with Could not load file or assembly ‘Microsoft.WindowsAzure.ServiceRuntime’ or one of its dependencies
- Everything still works just peachy locally, though
- Right-click… Properties on a Microsoft.WindowsAzure.ServiceRuntime reference in any project, find that the version mentioned is 22.214.171.124 when we’re expecting 126.96.36.199
- Even though the .csproj file specifically says the reference is 188.8.131.52 and with a hint-path pointing the right way
- Where is VS2012 pulling this 184.108.40.206 from?
- Run build with diagnostic-level logging to find 220.127.116.11 assembly being used in build process even though it’s not part of the .csproj file
Solution for me
- Close Visual Studio
- Find C:\Users\USERNAME\AppData\Local\Microsoft\VisualStudio\11.0\Designer\ShadowCache folder, delete all contents
- Uninstall all Azure SDKs lower than 1.8, and any other associated libraries etc using Add/Remove Programs
- Remove registry value at key HKLM\SOFTWARE\Classes\Installer\Assemblies\Global\Microsoft.WindowsAzure.ServiceRuntime,version=”18.104.22.168″…
- Use gacutil /u to uninstall the 1.7 assembly from the GAC
- Restart machine
- Delete entire source tree from local disk and pull down again from source control
- Clean solution
- Rebuild all
Error: Cannot obtain Metadata from http://example.com/api/service.svc?wsdl
- WCF Service with SOAP, JSON and XML endpoints where the SOAP endpoint is a basicHttpBinding rather than a wsHttpBinding
- All service endpoints behind SSL
- SecuritySwitch installed and rewrite rules enabled to force all requests through to SSL where available
- AspNetCompatibilityEnabled = true, MultipleSiteBindingsEnabled = true
- <useRequestHeadersForMetadataAddress> element added to service behaviour with default ports http -> 80 and https -> 443
- Trying to add a service reference in Visual Studio yields a number of errors:
HTTP GET Error
The document at the url https://example.com/api/Service.svc?wsdl
was not recognized as a known document type.
The error message from each known type may help you fix the problem:
- Report from 'XML Schema' is 'The document format is not recognized
(the content type is 'text/html; charset=UTF-8').'.
- Report from 'https://example.com/api/Service.svc?wsdl' is
'The document format is not recognized (the content type is
- Report from 'DISCO Document' is 'Discovery document at the URL
https://example.com/api/Service.svc?disco could not be found.'.
- The document format is not recognized.
- Report from 'WSDL Document' is 'The document format is
not recognized (the content type is 'text/html; charset=UTF-8').'.
Bugger. However, notice that the initial request URL is http, and the WSDL endpoint is https. Trying to navigate manually to the ?wsdl URL failed and just re-rendered the ‘You have created a service’ welcome page.
In my case the solution was to set httpsGetEnabled = true (in addition to httpGetEnabled=true) on the serviceMetadata tag for the service behaviour in question.
Hopefully Google’ll find this for me the next time I do it!
Windows Live Mesh is a remote desktop and file sync affair that works through firewalls – you install the client on the machines you want to communicate and can then connect to them so long as they’re turned on; so far, so PCAnywhere. That it’s free and works well has made it pretty useful in my day-to-day life, so I was surprised that on rebuilding my current PC and downloading the latest Windows Live Essentials (2012 version) the installer made no mention of Live Mesh – it’d disappeared.
Microsoft says that most of what you can do with Live Mesh you can now do with SkyDrive, but that’s pretty disingenuous. The examples they give of ‘how to do this on SkyDrive’ omit the biggest reason a lot of people used it in the first place – remote desktop. There’s just no equivalent in 2012.
Luckily you can still get the Live Essentials 2011 installer from the Microsoft website that does still contain it, but you’ll never be able to upgrade to anything in the 2012 package without it being automatically removed.
Thing is – if they just dropped support for Live Mesh (to the point of actively removing it from your machine when you upgrade), how long will Microsoft realistically keep that 2011 installer going?