Most Conditional Access teams already do some kind of review.
Someone checks the scope. Someone reads the grant controls. Maybe there is a CAB. Maybe the policies are exported and compared in Git. Maybe there are naming standards, baseline templates, and a rough idea of what "good" looks like.
That work is useful.
It is also only the first layer.
Once a Conditional Access estate gets even moderately complex, the real question is no longer just:
- are the policy objects configured sensibly?
It becomes:
- what will the engine actually do for real sign-in paths?
- what changed in the effective outcome after this edit?
- which users, apps, devices, and flows are now affected?
- where is the blast radius if we got the design wrong?
Static review cannot answer those questions by itself.
Defined scenario testing gets you much closer. It is a real improvement over policy inspection and one of the best habits a CA team can adopt.
But even that is still sampled coverage.
If your goal is to predict hidden interactions and change impact across a mature Conditional Access estate, the stronger approach is exhaustive deterministic simulation of the effective decision space: not just a few named scenarios, but the whole set of relevant runtime combinations your tenant can actually produce.
That is the distinction this post is about.
There are three different jobs here
A lot of confusion disappears once you stop treating all CA validation as one thing.
There are at least three separate jobs:
- object and configuration validation
- finite scenario testing
- exhaustive simulation and impact analysis
They are related, but they do not answer the same question.
Rendering diagram…
If a team only has the first layer, it is mostly checking intent.
If it has the first two layers, it is testing representative behaviour.
If it has all three, it can start reasoning about regression, coverage gaps, and change impact with much more confidence.
Object validation is useful, but it stops at the policy boundary
Static review works well for things it can actually see.
It can catch errors like:
- the wrong group was targeted
- an exclusion is too broad
all cloud appswas used where a narrower scope was saferrequire one of the selected controlswas chosen whererequire allwas intended- a break-glass exclusion was forgotten
- a policy is disabled, duplicated, or obviously overlapping
That is valuable. It is the CA equivalent of linting, config review, or schema validation.
But it still stops at the object boundary.
It tells you that the policy definitions look plausible. It does not reliably tell you what the engine will decide when a real sign-in arrives with a real combination of:
- user or role
- app and dependency path
- client type
- device state
- location context
- session state
- other applicable policies
That is the gap.
Conditional Access outcomes are produced by policy plus runtime context. Reviewing the policies alone does not give you a full execution model.
Finite scenario testing is a real step up
Once teams accept that CA is contextual, the obvious next step is to test scenarios rather than just review objects.
That means describing a concrete sign-in case and asking what the effective outcome should be.
A decent CA test case usually needs at least:
- who the actor is
- which resource is being accessed
- which client or auth path is used
- what device state is assumed
- what location context is assumed
- whether the session is fresh or existing
- what result is expected
That is already much better than saying "we tested SharePoint" or "the policy looked right."
A scenario like this is meaningful:
- A standard user in Finance, on a non-compliant Windows device, from an untrusted location, signing into SharePoint Online in a browser with no prior session, should be prompted for MFA and then denied because a compliant device is required.
That is testable. It has a runtime shape.
This is also where Microsoft’s own tooling points. The Entra What If tool and Microsoft Graph conditionalAccess/evaluate API both exist because administrators need to evaluate policy applicability against sign-in properties, not just stare at policy JSON and hope they have inferred the result correctly.
So yes: finite scenario testing is absolutely better than static review.
It is closer to how engineers test real systems, and it catches classes of failure that object inspection alone will miss.
Finite scenarios still leave a lot unknown
Defined scenario packs are stronger than static review, but they still have an unavoidable limitation.
They are sampled coverage.
No matter how carefully you choose the cases, you are still testing a finite set of named situations out of a much larger decision space.
That leaves several common blind spots:
- untested combinations across user type, device state, client path, and location
- hidden service dependencies
- broad policies whose impact changes when a new app or group enters scope
- session and token differences between fresh sign-in and existing access
- interactions across multiple policies that were not represented in the scenario pack
- low-frequency but high-impact paths, such as admin recovery or monthly finance workflows
This is one reason CA testing often looks stronger on paper than it is operationally. The scenario list may be well written, but it still only samples the estate.
A single browser path for one user on one device is not coverage. Five or ten carefully defined scenarios are better, but they still do not tell you how a policy change reshapes the full effective access surface.
That is where the next layer becomes important.
Exhaustive deterministic simulation is a different category of capability
By exhaustive deterministic simulation, I do not mean random test generation or vague "AI" guessing.
I mean systematically evaluating the effective Conditional Access decision space represented by the tenant model you care about.
In practice, that means taking the relevant dimensions of runtime behaviour, such as:
- identity categories
- app and dependency targets
- client paths
- device states
- network contexts
- session states
- applicable policy sets
and evaluating the combinations that are meaningful in the modeled estate.
The output is not just a handful of pass/fail scenario results. It is a coverage map and a decision diff.
You can ask questions like:
- which combinations moved from allow to prompt?
- which moved from prompt to block?
- which admin paths are now stricter than before?
- which unmanaged-device flows became newly accessible?
- which app dependencies pull an unexpected policy into scope?
That is a stronger capability than running a finite scenario pack, because it is aimed at the shape of the whole effective decision surface rather than a chosen sample from it.
A simple way to picture it is this:
Rendering diagram…
The key idea is not "simulate more things" in the abstract.
It is that exhaustive simulation answers a different operational question:
- not just "do these named cases pass?"
- but "what changed across the effective access space, including the cases we did not think to hand-pick?"
That is what makes it better suited to impact analysis.
Why this difference shows up so sharply in Conditional Access
This distinction matters more in CA than in a lot of simpler policy systems because CA has several properties that make interaction effects easy to miss:
- applicable policies accumulate
- app and service dependencies can pull adjacent policies into scope
- device and compliance signals change outside the CA blade
- group and role scope drifts even when no policy object changes
- session state changes what users actually feel
- the same business service can present several materially different auth paths
That is why so many CA incidents sound like this:
- nobody touched the policy, but access changed
- the browser path worked, but the desktop client broke
- report-only looked quiet, then enforcement caused a spike
- the visible app looked fine, but the dependency path was not
Those are not just failures of discipline. They are failures of coverage.
Static review never had a chance of catching them. A finite test pack might catch them if the right scenario happened to be included. Exhaustive simulation is stronger because it is built to expose those interactions systematically.
Microsoft’s own tooling shows both the need and the limit
Microsoft’s built-in tooling is useful evidence here.
The What If tool and Graph evaluation API make it explicit that CA decisions depend on supplied sign-in properties and that accurate evaluation requires as much context as possible. That is already a strong argument against relying on static review alone.
But Microsoft’s tooling also shows the limit of point-in-time scenario evaluation.
The What If tool is excellent for:
- investigating a specific sign-in shape
- checking which policies apply
- understanding why a case evaluates the way it does
- troubleshooting edge cases without waiting for live traffic
It is not, by itself, a full regression framework or an exhaustive impact-analysis engine.
Microsoft’s own documentation also notes an important limitation: the What If tool does not account for Conditional Access service dependencies. That is exactly the kind of hidden interaction that sampled evaluation can miss and that broader simulation work needs to surface.
So the lesson is not that Microsoft’s tooling is weak. It is that point-in-time evaluation and exhaustive impact analysis are different categories of tool.
Where Maester fits, and where it does not
This is also the fairest way to compare tools like Maester.
Maester is a PowerShell and Pester-based framework for testing and monitoring Microsoft 365 security configuration. It is clearly useful for:
- configuration compliance checks
- security-as-code workflows
- ready-made and custom Pester tests
- CI/CD-driven monitoring
- regression against defined expectations
- ongoing tenant guardrails
That is real value.
In many teams, adopting something like Maester would be a substantial improvement over ad hoc review, one-off screenshots, and tribal knowledge.
It also has a Conditional Access What-If story, which makes sense for defined scenario checks and change validation.
But that is still not the same thing as exhaustive runtime-decision simulation.
The underlying difference is straightforward:
- config validation asks whether objects and settings meet defined expectations
- finite scenario testing asks whether selected sign-in cases evaluate as expected
- exhaustive simulation asks how the effective decision space behaves across the modeled estate, and what changes between two states
Maester is well aligned to the first category and can support parts of the second. It is not really aimed at the third in the stronger sense used here.
That is not a criticism of Maester. It is just a category boundary.
If your goal is compliance, guardrails, and repeatable checks against known policy conditions, Maester is a sensible tool.
If your goal is predicting Conditional Access blast radius and hidden interactions across the effective runtime decision space, you need something broader than a finite set of authored tests.
Why exhaustive simulation is better for blast radius analysis
Blast radius is the practical reason this distinction stops being academic.
Suppose you change one CA policy and want to know the impact before enforcement.
Static review can tell you what you edited.
A finite scenario pack can tell you whether your chosen test cases still pass.
Exhaustive simulation can tell you something much closer to what an engineer actually wants to know:
- which classes of user and device are newly blocked
- which flows now require MFA where they did not before
- which previously blocked paths became allowed
- which apps became indirectly affected through dependencies
- which admin or recovery paths are now in danger
- where the delta is narrow and intentional versus broad and surprising
That is a better fit for CA because so many failures are caused by interaction effects rather than obvious syntax errors.
The strongest workflow is not "pick one tool and pretend it does everything."
It is to stack the layers in order:
- validate the objects
- run defined scenario checks
- simulate the broader effective decision space and diff the result
- use report-only or limited rollout to compare prediction with observed traffic
- enforce once the unknowns are acceptably small
That gives each layer a clear job.
A concrete example
Imagine a tenant with:
- baseline MFA for all users
- compliant-device requirements for selected Microsoft 365 services
- location-based exceptions for a trusted office egress
- stricter controls for admins
- a few legacy exclusions that have never been cleaned up properly
Now change one thing: a broad productivity policy is extended to include another app set.
Three validation layers produce three different kinds of confidence.
Object validation says
- the policy syntax is valid
- the assignments look intentional
- the exclusion list is unchanged
- the grant controls are consistent
Useful, but shallow.
Finite scenario testing says
- the standard browser case still works from a managed device
- unmanaged browser access is still blocked
- one admin sign-in still requires the expected control
- one Teams case still passes
Better.
Exhaustive simulation says
- native desktop flows touching SharePoint-backed resources now inherit a stricter requirement
- a subset of existing-session paths now re-evaluate into prompt instead of allow
- one admin recovery path moves from prompt to block because of policy accumulation
- contractor access from an untrusted network is affected even though it was not in the authored test pack
- the total changed surface is small in browser flows but wide in mobile and desktop combinations
That is the difference between "the tests we wrote passed" and "we understand the impact of this change."
Where Altitude Labs fits
This is the problem space Altitude Labs is aiming at.
The point is not that static review is useless or that finite scenario testing is bad. Both are worth doing, and many teams still need to get better at them first.
The point is that Conditional Access eventually needs a stronger engineering model than object review plus a small hand-written test pack.
Altitude Labs is better understood in that context: not as a generic compliance checker, but as an approach centred on deterministic simulation, change diffing, regression coverage, and impact analysis across the effective CA decision space.
That is a narrower and more defensible claim than saying it is simply "better" than tools like Maester.
For compliance guardrails and policy-as-code checks, Maester is useful.
For understanding how Conditional Access decisions shift across a tenant when runtime combinations and policy interactions are the real problem, exhaustive deterministic simulation is the stronger tool class.
What comes next
Once you accept that Conditional Access needs more than static review, the next question is not just how to test one change.
It is how to make the whole thing repeatable.
That means baselines, regression suites, expected-outcome diffs, and eventually cross-tenant modelling.
That is where the series goes next.
Because the real step up is not from policy review to one-off troubleshooting. It is from careful administration to an actual testing discipline.
Next in the series: Regression Testing for Identity Policy: Baselines, Change Impact, and MSP-Scale Conditional Access
Sources
- Microsoft Learn: The Conditional Access What If tool
- Microsoft Graph: conditionalAccess/evaluate (What If evaluation)
- Maester: Project repository
- Maester: Project site