By t.enix


2019-01-11 12:53:27 8 Comments

I have a problem with Azure SQL cross-database queries, it looks like if we join data across databases the join is not distributed as input to the external database.

Currently I tried to implement a simple example and therefore I used this starting point: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-elastic-query-getting-started-vertical.

As preperation I created 199999 CustomerInformation entries and 200 OrderInformation entries.

Following query works as expected the CustomerInformation Table returns just one row:

SELECT o.OrderId, c.CustomerID, c.CustomerName
  FROM OrderInformation o
  JOIN CustomerInformation c ON o.CustomerID = c.CustomerID
 WHERE c.CustomerID = 2

Expectd Query plan

But any query with a join that does not specifiy the CustomerID returns all the rows from the Customers Database.

SELECT o.OrderId, c.CustomerID, c.CustomerName
  FROM OrderInformation o
  JOIN CustomerInformation c ON o.CustomerID = c.CustomerID
 WHERE o.OrderId = 1870

external database returns too many rows

Therefore the queries are really slow because the servers transfer to many rows. Do I miss some setting or is it not possible to separate this data with the suggested queries.

Thanks for your help.

2 comments

@Randi Vertongen 2019-01-11 20:23:55

Why

The issue is that the OrderId seek on the OrderInformation table cannot be pushed down to the remote query. If the seek was on the Customerid column of the OrderInformation table, the filtering could happen. A way around this, is by getting the filtering to happen earlier, by using max / min seek predicates with variables. There could be other ways around this.

Creating the test data

Customerinformation (remote db)

CREATE TABLE dbo.CustomerInformation (CustomerID int identity(1,1) , CustomerName nvarchar(255));

DECLARE @i int = 1;
WHILE @i < 20000
BEGIN
INSERT INTO CustomerInformation(CustomerName)
VALUES ('Frodo');

SET @i += 1;
END

On the database that will use the external table

CREATE TABLE OrderInformation(OrderId Int identity(1,1), CustomerId int,Ordervalue int);


SET NOCOUNT ON;
DECLARE @i int = 1;
WHILE @i < 200 
BEGIN
INSERT INTO OrderInformation(CustomerId,Ordervalue)
Values (@I,111);

set @i += 1;
END

CREATE EXTERNAL TABLE  [dbo].CustomerInformation(
    CustomerID [int],
CustomerName nvarchar(255))
    WITH  
(  
    DATA_SOURCE = Stackoverflow  
); 

Test a query that matches the OP's example

SELECT o.OrderId, c.CustomerID, c.CustomerName
  FROM OrderInformation o
  JOIN CustomerInformation c ON o.CustomerID = c.CustomerID
 WHERE o.OrderId = 155

enter image description here

Testing with a CTE

; WITH CTE AS
 (
    SELECT o.OrderId,o.CustomerID 
    FROM OrderInformation o
    WHERE o.OrderId = 155
)
  SELECT o.OrderId, c.CustomerID, c.CustomerName
  FROM cte o
  JOIN CustomerInformation c ON o.CustomerID = c.CustomerID 
  WHERE o.OrderId = 155;

Nope, same result

Temp table maybe?

    SELECT o.OrderId,o.CustomerID 
    INTO #temp
    FROM OrderInformation o
    WHERE o.OrderId = 155;

  SELECT o.OrderId, c.CustomerID, c.CustomerName
  FROM #temp o
  JOIN CustomerInformation c 
  ON o.CustomerID = c.CustomerID ;

  DROP TABLE #temp;

No dice there

Variables?

 DECLARE @customerid int;

 DECLARE @orderid  int;

SELECT @orderid=o.OrderId, @customerid = o.CustomerID  
FROM OrderInformation o
WHERE o.OrderId = 155

SELECT @orderid, c.CustomerID, c.CustomerName
FROM CustomerInformation c 
WHERE c.CustomerID = @customerid;

Variables work, but it is only possible for a single ID.

enter image description here enter image description here

This article shows that it used to be that the only way we can push down predicates to the external table is by using variables.

Source


To me it looks like, unless we change the query to use o.Customerid , or use a variable, we cannot push down the predicate For example change the query filter to where o.customerid = 155:

  SELECT o.OrderId, c.CustomerID, c.CustomerName
  FROM CustomerInformation c 
  INNER JOIN OrderInformation o 
  ON c.CustomerId = o.CustomerId 
  WHERE o.CustomerId = 155;

enter image description here

This works, and pushes down the Customerid filter, even if we are filtering on o.customerID. So to me it seems like we can only push down the predicate where the join is based on, if it is defined as single values.

As a result, this also pushes down the predicate:

     SELECT o.OrderId,c.CustomerID, c.CustomerName
     FROM CustomerInformation c 
     JOIN OrderInformation o on o.CustomerId = c.CustomerID
     WHERE o.customerID > 1 and o.Customerid < 10;

enter image description here

Which means that we could use something like this to find all the values that match (too many maybe)

     DECLARE @MAX INT, @MIN INT

     SELECT @MAX= max(o.Customerid) , @MIN = MIN(o.Customerid)
     from 
     OrderInformation o 
     WHERE o.OrderId = 155

     SELECT o.OrderId,c.CustomerID, c.CustomerName
     FROM CustomerInformation c 
     where  c.customerID >= @MIN and c.Customerid <= @MAX;

And afterwards, put that in a CTE, since we might get too many values in the range that we are seeking on.

 DECLARE @MAX INT, @MIN INT;

 SELECT @MAX= max(o.Customerid) , @MIN = MIN(o.Customerid)
 FROM
 OrderInformation o 
 WHERE o.OrderId = 155

 ;WITH CTE as(
 SELECT c.CustomerID, c.CustomerName
 FROM CustomerInformation c 
 WHERE c.customerID >= @MIN 
 AND c.Customerid <= @MAX
 )

 SELECT o.OrderId, c.CustomerID, c.CustomerName
 FROM CTE c
 INNER JOIN  OrderInformation o
 on o.customerID = c.CustomerID
 where o.Orderid = 155;

Which does less reads on the remote query, Similar to this example.

enter image description here

Paste the plan of the final result:

https://www.brentozar.com/pastetheplan/?id=By_WZ_qzE


Other tests that did not work

Subquery

SELECT o2.OrderId, c.CustomerID, c.CustomerName
FROM ( SELECT o2.CustomerID,o2.OrderId 
       FROM OrderInformation o2  
       WHERE o2.OrderId =  155) AS o2
  JOIN CustomerInformation c ON o2.CustomerID = c.CustomerID ;

Filter in join instead of where clause?

  SELECT o.OrderId, c.CustomerID, c.CustomerName
  FROM CustomerInformation c 
  INNER JOIN OrderInformation o ON c.CustomerId = o.CustomerId and o.OrderId = 155;

Still all 19999 rows read in the above two examples

@mgrollins 2019-01-11 19:37:19

According to the elastic query overview,

Elastic queries can now push SQL parameters to the remote databases for execution.

Try including the order.CustomerID in your WHERE clause, to force the CustomerID to be sent with the query to the remote db.

WHERE o.OrderId = 1870 AND c.CustomerID = o.CustomerID

from Microsoft's Azure SQL Database elastic query overview (preview)

Otherwise, I saw a similar issue once when working with regular linked tables for an on-premise MS SQL server, and it turned out to be permission issue on the remote host. No ability of the reading server to read remote indexes was causing all rows to be returned like you are seeing.

Related Questions

Sponsored Content

1 Answered Questions

0 Answered Questions

Azure SQL Query Analysis

1 Answered Questions

[SOLVED] Azure SQL Server External Data Sources

1 Answered Questions

[SOLVED] Cannot create SP in Azure SQL Database

2 Answered Questions

[SOLVED] Linked Server alternative in Sql Azure database

2 Answered Questions

[SOLVED] User can't right-click SQL Azure database in SSMS

0 Answered Questions

Stage database in SQL Azure

1 Answered Questions

[SOLVED] Azure SQL Databases hit cpu limits too quickly

1 Answered Questions

Sponsored Content