It's part of an effort to move towards something called "combined engineering" and was started as a pilot in Bing. The basic idea is less testers, more devs. Devs take on more responsibility for testing, while at the same time getting more details about usage from customers in the wild... My understanding is that it's like A/B testing features - something Bing does a lot with "micro-flighting".