I ran the same query, which involves joining two tables A and B, when B had no index at all and then added a PK clustered index on the ID column used for joining to it. The time elapsed when there was no index was actually faster?
No index result:
Table 'Table_A'. Scan count 5, logical reads 1764, ... Table 'Table_B'. Scan count 10, logical reads 16976, ... SQL Server Execution Times: CPU time = 859 ms, elapsed time = 269 ms.
Added PK clustered index to Table_B
Table 'Table_A'. Scan count 1, logical reads 445, ... Table 'Table_B'. Scan count 2, logical reads 17026, ... SQL Server Execution Times: CPU time = 1046 ms, elapsed time = 1053 ms.
Even though the IO numbers are higher when Table_B had no index, how could it be faster still? And how could a table with index slower than without? Any logical explanation?
Looking at the ratio of elapsed time to CPU time for both queries, we can see that the “no index” query benefited from parallelism – CPU time was about 3 times greater than elapsed time.
After adding the clustered index to Table_B, you got a serial version of that execution plan (CPU time and elapsed time were about equal). Or maybe an entirely different serial execution plan, but the fact that Table_B was scanned twice as often as Table_A in both queries makes me think the general plan shape could be the same.
With the clustered index in place, perhaps the optimizer reasoned that it could take advantage of the sorted nature of the clustered index, and thus didn’t need to incur the overhead of parallelism.
In this case, that seems like a good decision by the optimizer overall: the second query used less CPU cores, read less data pages, at the cost of a small increase in elapsed time
If you edit your question to include more details (actual execution plan, table definitions, sample data), then someone might be able to provide an answer more specific to your situation.
As already answered by Mr. Josh Darnell in detail about the decision made by optimizer and details on parallelism due to pre-sorted data, I would like to add my bit in the answer that, we mostly rely on logical reads for measuring performance and not the time because time varies a lot based on many factors like load on the server, network, disk etc(You can test it by running the same query multiple time and check duration) whereas logical reads will remain consistent no matter how many times you run except the only exceptions wherein maintenance job has been run or statistics have been updated or change in the index.
Having said that, if you note the difference between time elapsed in first query and second query, it is very minimal whereas logical reads are 5 times lesser(scan count). I know it could sound too much since numbers are very small here however if we take this on a larger scale, the data brought by optimizer in memory to satisfy query will be 5 times lesser than that of with clustered index compared to the one with non-clustered index.
This is the reason its best practice to have clustered index in place in almost all scenarios except when its a staging database where only insert takes places and select query is run seldom.