By super9


2009-08-03 10:15:10 8 Comments

Just wondering if any of you guys use Count(1) over Count(*) and if there is a noticeable difference in performance or if this is just a legacy habit that has been brought forward from days gone past?

(The specific database is SQL Server 2005.)

12 comments

@Graeme 2017-09-22 16:51:25

COUNT(1) is not substantially different from COUNT(*), if at all. As to the question of COUNTing NULLable COLUMNs, this can be straightforward to demo the differences between COUNT(*) and COUNT(<some col>)--

USE tempdb;
GO

IF OBJECT_ID( N'dbo.Blitzen', N'U') IS NOT NULL DROP TABLE dbo.Blitzen;
GO

CREATE TABLE dbo.Blitzen (ID INT NULL, Somelala CHAR(1) NULL);

INSERT dbo.Blitzen SELECT 1, 'A';
INSERT dbo.Blitzen SELECT NULL, NULL;
INSERT dbo.Blitzen SELECT NULL, 'A';
INSERT dbo.Blitzen SELECT 1, NULL;

SELECT COUNT(*), COUNT(1), COUNT(ID), COUNT(Somelala) FROM dbo.Blitzen;
GO

DROP TABLE dbo.Blitzen;
GO

@Conor Cunningham MSFT 2019-01-15 03:05:48

I work on the SQL Server team and I can hopefully clarify a few points in this thread (I had not seen it previously, so I am sorry the engineering team has not done so previously).

First, there is no semantic difference between select count(1) from table vs. select count(*) from table. They return the same results in all cases (and it is a bug if not). As noted in the other answers, select count(column) from table is semantically different and does not always return the same results as count(*).

Second, with respect to performance, there are two aspects that would matter in SQL Server (and SQL Azure): compilation-time work and execution-time work. The Compilation time work is a trivially small amount of extra work in the current implementation. There is an expansion of the * to all columns in some cases followed by a reduction back to 1 column being output due to how some of the internal operations work in binding and optimization. I doubt it would show up in any measurable test, and it would likely get lost in the noise of all the other things that happen under the covers (such as auto-stats, xevent sessions, query store overhead, triggers, etc.). It is maybe a few thousand extra CPU instructions. So, count(1) does a tiny bit less work during compilation (which will usually happen once and the plan is cached across multiple subsequent executions). For execution time, assuming the plans are the same there should be no measurable difference. (One of the earlier examples shows a difference - it is most likely due to other factors on the machine if the plan is the same).

As to how the plan can potentially be different. These are extremely unlikely to happen, but it is potentially possible in the architecture of the current optimizer. SQL Server's optimizer works as a search program (think: computer program playing chess searching through various alternatives for different parts of the query and costing out the alternatives to find the cheapest plan in reasonable time). This search has a few limits on how it operates to keep query compilation finishing in reasonable time. For queries beyond the most trivial, there are phases of the search and they deal with tranches of queries based on how costly the optimizer thinks the query is to potentially execute. There are 3 main search phases, and each phase can run more aggressive(expensive) heuristics trying to find a cheaper plan than any prior solution. Ultimately, there is a decision process at the end of each phase that tries to determine whether it should return the plan it found so far or should it keep searching. This process uses the total time taken so far vs. the estimated cost of the best plan found so far. So, on different machines with different speeds of CPUs it is possible (albeit rare) to get different plans due to timing out in an earlier phase with a plan vs. continuing into the next search phase. There are also a few similar scenarios related to timing out of the last phase and potentially running out of memory on very, very expensive queries that consume all the memory on the machine (not usually a problem on 64-bit but it was a larger concern back on 32-bit servers). Ultimately, if you get a different plan the performance at runtime would differ. I don't think it is remotely likely that the difference in compilation time would EVER lead to any of these conditions happening.

Net-net: Please use whichever of the two you want as none of this matters in any practical form. (There are far, far larger factors that impact performance in SQL beyond this topic, honestly).

I hope this helps. I did write a book chapter about how the optimizer works but I don't know if its appropriate to post it here (as I get tiny royalties from it still I believe). So, instead of posting that I'll post a link to a talk I gave at SQLBits in the UK about how the optimizer works at a high level so you can see the different main phases of the search in a bit more detail if you want to learn about that. Here's the video link: https://sqlbits.com/Sessions/Event6/inside_the_sql_server_query_optimizer

@Dherik 2018-05-16 12:54:01

There is an article showing that the COUNT(1) on Oracle is just an alias to COUNT(*), with a proof about that.

I will quote some parts:

There is a part of the database software that is called “The Optimizer”, which is defined in the official documentation as “Built-in database software that determines the most efficient way to execute a SQL statement“.

One of the components of the optimizer is called “the transformer”, whose role is to determine whether it is advantageous to rewrite the original SQL statement into a semantically equivalent SQL statement that could be more efficient.

Would you like to see what the optimizer does when you write a query using COUNT(1)?

With a user with ALTER SESSION privilege, you can put a tracefile_identifier, enable the optimizer tracing and run the COUNT(1) select, like: SELECT /* test-1 */ COUNT(1) FROM employees;.

After that, you need to localize the trace files, what can be done with SELECT VALUE FROM V$DIAG_INFO WHERE NAME = 'Diag Trace';. Later on the file, you will find:

SELECT COUNT(*) “COUNT(1)” FROM “COURSE”.”EMPLOYEES” “EMPLOYEES”

As you can see, it's just an alias for COUNT(*).

Another important comment: the COUNT(*) was really faster two decades ago on Oracle, before Oracle 7.3:

Count(1) has been rewritten in count(*) since 7.3 because Oracle like to Auto-tune mythic statements. In earlier Oracle7, oracle had to evaluate (1) for each row, as a function, before DETERMINISTIC and NON-DETERMINISTIC exist.

So two decades ago, count(*) was faster

For another databases as Sql Server, it should be researched individually for each one.

I know that this question is specific for Sql Server, but the other questions on SO about the same subject, without mention the database, was closed and marked as duplicated from this answer.

@Tony Andrews 2009-08-03 10:45:17

Clearly, COUNT(*) and COUNT(1) will always return the same result. Therefore, if one were slower than the other it would effectively be due to an optimiser bug. Since both forms are used very frequently in queries, it would make no sense for a DBMS to allow such a bug to remain unfixed. Hence you will find that the performance of both forms is (probably) identical in all major SQL DBMSs.

@Benubird 2012-01-11 23:44:02

+1, because that's such a beautiful example of a fallacious argument. Looks like you're affirming the consequent, begging the question, and is that an appeal to belief? I think it is! There might even be a false dilemma in there somewhere... That is some very nice phrasing! :)

@Tony Andrews 2013-05-14 11:05:21

I read and re-read @Benubird's comment and still can't decide whether it is complimentary or insulting! I'll opt for complimentary...

@Thorsten Kettner 2014-05-05 10:32:48

I wouldn't consider it a bug if count(1) were slower than count(*). If you ask the dbms to generate 1s and count those that are not null, then yes, it boils down to be the record count, but you cannot expect the dbms to detect every nonsense you write and circumvent it for you.

@Tony Andrews 2014-05-05 11:00:59

Well, an optimiser is meant to optimise and for a count there are just 2 cases to consider: expression that may be null, expression that will never be null: count(1) falls into the latter so there is no need for the DBMS to "generate" 1s to answer the question. (BTW I would never use anything but count(*), just for aesthetic reasons.)

@Eyal Z. 2010-02-04 02:27:28

SET STATISTICS TIME ON

select count(1) from MyTable (nolock) -- table containing 1 million records. 

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 36 ms.

select count(*) from MyTable (nolock) -- table containing 1 million records. 

SQL Server Execution Times:
CPU time = 46 ms, elapsed time = 37 ms.

I've ran this hundreds of times, clearing cache every time.. The results vary from time to time as server load varies, but almost always count(*) has higher cpu time.

@Jeff Atwood 2010-02-04 06:15:33

I can't reproduce this. count(*) and count(1) return results within a few ms of each other, even when counting a table with 4.5 million rows, in my SQL 2008 instance.

@JosephDoggie 2018-03-05 20:52:43

Sometimes, in some systems, the statement ran first always runs faster ... have you randomized the order in which they are run?

@RBT 2018-03-05 23:50:11

@JosephDoggie one should always restart SQL Server service before running every query while taking such measurements/statistics. When you've just started SQL Server service then every run becomes totally independent and hece the order of query should not matter. On the other hand, if you don't restart the SQL Server service and engine does some sort of caching of execution plans then the query being run later should run faster not the first one.

@Conor Cunningham MSFT 2019-05-18 22:09:07

The execution times need to look at the exact query plans when doing comparisons. If they are different (say, hash aggregate vs. sort + stream aggregate), then the results are not comparable. So, I urge caution drawing conclusions here without more data.

@RBT 2016-08-20 03:54:41

I ran a quick test on SQL Server 2012 on an 8 GB RAM hyper-v box. You can see the results for yourself. I was not running any other windowed application apart from SQL Server Management Studio while running these tests.

My table schema:

CREATE TABLE [dbo].[employee](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_employee] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

Total number of records in Employee table: 178090131 (~ 178 million rows)

First Query:

Set Statistics Time On
Go    
Select Count(*) From Employee
Go    
Set Statistics Time Off
Go

Result of First Query:

 SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 35 ms.

 (1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 10766 ms,  elapsed time = 70265 ms.
 SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

Second Query:

    Set Statistics Time On
    Go    
    Select Count(1) From Employee
    Go    
    Set Statistics Time Off
    Go

Result of Second Query:

 SQL Server parse and compile time: 
   CPU time = 14 ms, elapsed time = 14 ms.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 11031 ms,  elapsed time = 70182 ms.
 SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

You can notice there is a difference of 83 (= 70265 - 70182) milliseconds which can easily be attributed to exact system condition at the time queries are run. Also I did a single run, so this difference will become more accurate if I do several runs and do some averaging. If for such a huge data-set the difference is coming less than 100 milliseconds, then we can easily conclude that the two queries do not have any performance difference exhibited by the SQL Server Engine.

Note : RAM hits close to 100% usage in both the runs. I restarted SQL Server service before starting both the runs.

@Nakul Chaudhary 2009-08-03 13:10:13

COUNT(*) and COUNT(1) are same in case of result and performance.

@gbn 2009-08-03 10:36:52

There is no difference.

Reason:

Books on-line says "COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )"

"1" is a non-null expression: so it's the same as COUNT(*). The optimizer recognizes it for what it is: trivial.

The same as EXISTS (SELECT * ... or EXISTS (SELECT 1 ...

Example:

SELECT COUNT(1) FROM dbo.tab800krows
SELECT COUNT(1),FKID FROM dbo.tab800krows GROUP BY FKID

SELECT COUNT(*) FROM dbo.tab800krows
SELECT COUNT(*),FKID FROM dbo.tab800krows GROUP BY FKID

Same IO, same plan, the works

Edit, Aug 2011

Similar question on DBA.SE.

Edit, Dec 2011

COUNT(*) is mentioned specifically in ANSI-92 (look for "Scalar expressions 125")

Case:

a) If COUNT(*) is specified, then the result is the cardinality of T.

That is, the ANSI standard recognizes it as bleeding obvious what you mean. COUNT(1) has been optimized out by RDBMS vendors because of this superstition. Otherwise it would be evaluated as per ANSI

b) Otherwise, let TX be the single-column table that is the result of applying the <value expression> to each row of T and eliminating null values. If one or more null values are eliminated, then a completion condition is raised: warning-

@Quassnoi 2009-08-03 10:34:38

In SQL Server, these statements yield the same plans.

Contrary to the popular opinion, in Oracle they do too.

SYS_GUID() in Oracle is quite computation intensive function.

In my test database, t_even is a table with 1,000,000 rows

This query:

SELECT  COUNT(SYS_GUID())
FROM    t_even

runs for 48 seconds, since the function needs to evaluate each SYS_GUID() returned to make sure it's not a NULL.

However, this query:

SELECT  COUNT(*)
FROM    (
        SELECT  SYS_GUID()
        FROM    t_even
        )

runs for but 2 seconds, since it doen't even try to evaluate SYS_GUID() (despite * being argument to COUNT(*))

@asgs 2018-10-28 20:09:07

it should evaluate SYS_GUID() at least (I mean, exactly) once for the sub-query to return result, right?

@Quassnoi 2018-10-28 20:58:40

@asgs: why do you think so? How does COUNT(*) depend on the values of SYS_GUID?

@asgs 2018-10-29 20:16:29

now that you ask, I'm not sure. I thought for COUNT(*) to run, it needs a table, so the sub-query should act like one. Otherwise, I don't see a way for COUNT(*) to return a meaningful value

@Quassnoi 2018-10-29 20:22:40

@asgs: it does need a rowset, but it does not need to evaluate its tuples

@asgs 2018-10-29 20:49:21

wouldn't the sub-query need to be executed in order to come up with a rowset? looks like I'm missing something basic

@Quassnoi 2018-10-29 21:00:14

@asgs: not necessarily, no. Most languages would optimize a && b and not bother to evaluate b if a is false. The same principle applies here.

@asgs 2018-10-30 19:04:18

I totally understand the operator evaluation optimization in the programming languages. Yet, I still don't see how this applies here without evaluating the SYS_GUID() function at all

@Quassnoi 2018-10-30 20:04:04

@asgs: assuming you know what does the map method do, do you see how these two expressions: t_even.map(() => sys_guid()).length and t_even.length would always return the same value? Oracle's optimizer is smart enough to see it to and optimize the map part out.

@asgs 2018-10-31 17:58:41

OK. IIUC, the way the length attribute doesn't depend on what the map method returns but only on what the underlying collection consists of, the same way COUNT(*) doesn't NEED to depend on SYS_GUID()'s value to determine the row count as it can fetch the row count of the underlying table independently.

@Quassnoi 2018-10-31 18:29:25

@asgs exactly. Just a minor correction: length does not quite depend on what the collection consists of, just on the number of its elements. If this number is stored in the collection's metadata (this is not the case for Oracle or most other modern RDBMS but is the case for old MySQL's storage engine, MyISAM), then COUNT(*) would just need to take the value from the metadata.

@asgs 2018-10-31 19:51:11

Right, I meant the same thing. Thank you very much for taking the time to educate me! Appreciate the patience

@Thorsten Kettner 2014-05-05 10:43:08

As this question comes up again and again, here is one more answer. I hope to add something for beginners wondering about "best practice" here.

SELECT COUNT(*) FROM something counts records which is an easy task.

SELECT COUNT(1) FROM something retrieves a 1 per record and than counts the 1s that are not null, which is essentially counting records, only more complicated.

Having said this: Good dbms notice that the second statement will result in the same count as the first statement and re-interprete it accordingly, as not to do unnecessary work. So usually both statements will result in the same execution plan and take the same amount of time.

However from the point of readability you should use the first statement. You want to count records, so count records, not expressions. Use COUNT(expression) only when you want to count non-null occurences of something.

@onedaywhen 2011-05-13 10:36:40

In the SQL-92 Standard, COUNT(*) specifically means "the cardinality of the table expression" (could be a base table, `VIEW, derived table, CTE, etc).

I guess the idea was that COUNT(*) is easy to parse. Using any other expression requires the parser to ensure it doesn't reference any columns (COUNT('a') where a is a literal and COUNT(a) where a is a column can yield different results).

In the same vein, COUNT(*) can be easily picked out by a human coder familiar with the SQL Standards, a useful skill when working with more than one vendor's SQL offering.

Also, in the special case SELECT COUNT(*) FROM MyPersistedTable;, the thinking is the DBMS is likely to hold statistics for the cardinality of the table.

Therefore, because COUNT(1) and COUNT(*) are semantically equivalent, I use COUNT(*).

@gbn 2011-08-19 16:43:39

SQL-92 text linked from my answer on DBA.SE: dba.stackexchange.com/questions/2511/…

@Richard 2009-08-03 10:17:05

I would expect the optimiser to ensure there is no real difference outside weird edge cases.

As with anything, the only real way to tell is to measure your specific cases.

That said, I've always used COUNT(*).

@David Manheim 2012-06-12 15:58:23

Per the accepted answer, this is not true for MS SQL - there is actually no difference between the two .

Related Questions

Sponsored Content

29 Answered Questions

47 Answered Questions

37 Answered Questions

5 Answered Questions

[SOLVED] Multiple Indexes vs Multi-Column Indexes

40 Answered Questions

[SOLVED] How to return only the Date from a SQL Server DateTime datatype

10 Answered Questions

[SOLVED] Improve INSERT-per-second performance of SQLite?

28 Answered Questions

[SOLVED] How can I prevent SQL injection in PHP?

33 Answered Questions

[SOLVED] How do I UPDATE from a SELECT in SQL Server?

12 Answered Questions

[SOLVED] Should you choose the MONEY or DECIMAL(x,y) datatypes in SQL Server?

  • 2009-02-24 17:54:38
  • Wayne Molina
  • 322088 View
  • 407 Score
  • 12 Answer
  • Tags:   sql-server types

4 Answered Questions

[SOLVED] Inserting multiple rows in a single SQL query?

Sponsored Content