I'm trying to figure out a situation that I encountered last week where, in an attempt to temporarily fix a plan regression during an incident, I forced an execution plan using Query Store, but found it didn't work as expected. I have read Andrew Kelly's article on how forced doesn't always mean forced and think this may be what I'm seeing, but I'm hoping for some more insight into it, as in my case the plans are more different than I'd expect. I've also had a look over the Plan Forcing Limitations section of the Query store documentation on plan forcing, but I don't think any of the restrictions here apply.
The view in query store is shown below - around 8:30 a new plan with a higher cost showed up (2.14178 vs 0.894238), and I forced the plan which was in use previously. As can be seen in the graph, the new plan is shown against the metrics for the query from this point on, despite me having forced the old plan:
Looking at sys.query_store_plan, I can see that the old plan is showing as forced and has nothing against it suggesting any failures in forcing it:
The strange thing here is that when I later looked in the plan cache, neither of these plans were in use, though the plan in use did have the same query plan hash value as plan 13449 shown above, despite being quite different. The plan that was actually in use had a much higher estimated cost at 72.6743.
I took the compiled values from each of these plans and ran the query for the three of them to get an idea of what the actual metrics looked like for the plans and the values are quite different from the estimates. Notably, the plan I took from the cache produced a ~400 MB memory grant due to it using a different index and having to sort the data, compared to an 8 MB memory grant on the other 2 plans, with no sort. The estimated costs of the three plans were a bit closer, but the 2 from query store had much higher estimated costs than were in the plans shown in query store.
This is the query this happened with:
( @id_group int, @create_date datetime, @rows_per_page int, @page_number int ) SELECT * FROM ts_customer WHERE id_group = @id_group AND enabled = 'Y' AND (create_date >= @create_date) ORDER BY full_name, id_customer desc offset ((@page_number - 1) * @rows_per_page) rows fetch next @rows_per_page rows only
Could anyone help me understand why the plan wasn't forced, and why the one in use was so much more expensive?
I know that
select * will be causing the large memory grant as it is a wide table, but I'm mainly trying to understand the behaviour seen here with query store, and to a lesser extent, why on the plan that was used instead of the forced plan it used a different index and had to do the sort.
The database this happened with is running on Azure SQL DB, compatibility level 140, and using the legacy cardinality estimator.
I've attached all 3 estimated plans, as well as the plan that was generated when I ran the query with each set of compiled values. These can be accessed here.