I am getting more rows then I expect returned by my query.
I believe it has something to do with my join statements.
There are multiple tables that have different information in them.
Person holds the primary information about the person but not the address, or phone, or email. This is because the original designer wanted the table to be able to hold multiple phone numbers and emails and addresses.
SELECT (person.FirstName + ' ' + person.LastName) as FullName ,ISNULL(Person.isClient, '') ,ISNULL(Person.UDF1, '') ,ISNULL(Address.City, '') ,ISNULL(Address.state, '') ,PersonAddress.Person ,PersonAddress.Address ,ISNULL(Phone.PhoneNumber, 'N/A') ,Email.Email ,Person.Website FROM Person left join PersonAddress on Person.ID = PersonAddress.Person left join Address on PersonAddress.Address = Address.ID left join PersonPhone on Person.ID = PersonPhone.Person left join Phone on PersonPhone.Person = Phone.ID left join Email with (nolock) on Person.ID = Email.Person WHERE ( isclient = 'prospect' or isclient = 'client' ) and Address is not null and name like '%Mike%' ORDER BY isClient asc;
For this example, I get 6 rows of “Mike Worths”. 3 of the copies have one email and three have another email.
For “Mike Pamstein” I get two duplicate rows with the same email.
I need the results to only contain a single unique row for each person.
I want to drop the second email.
Presumably, you want to see a single entry for each unique person/address/email/website combination. If so, try this:
SELECT (person.FirstName + ' ' + person.LastName) as FullName , ISNULL(Person.isClient, '') , ISNULL(Person.UDF1, '') , ISNULL([Address].City, '') , ISNULL([Address].[state], '') , PersonAddress.Person , PersonAddress.[Address] , ISNULL(Phone.PhoneNumber, 'N/A') , Email.Email , Person.Website FROM dbo.Person LEFT JOIN dbo.PersonAddress ON Person.ID = PersonAddress.Person LEFT JOIN dbo.[Address] ON PersonAddress.[Address] = [Address].ID LEFT JOIN dbo.PersonPhone ON Person.ID = PersonPhone.Person LEFT JOIN dbo.Phone ON PersonPhone.Person = Phone.ID LEFT JOIN dbo.Email WITH (NOLOCK) ON Person.ID = Email.Person WHERE ( isclient = 'prospect' or isclient = 'client' ) and [Address] is not null and name like '%Mike%' GROUP BY (person.FirstName + ' ' + person.LastName) , ISNULL(Person.isClient, '') , ISNULL(Person.UDF1, '') , ISNULL([Address].City, '') , ISNULL([Address].state, '') , PersonAddress.Person , PersonAddress.[Address] , ISNULL(Phone.PhoneNumber, 'N/A') , Email.Email , Person.Website ORDER BY isClient asc;
GROUP BY clause at the end ensures only a single row is returned for each unique combination of columns in the
GROUP BY clause. This should prevent duplicate rows being displayed in your results.
A couple of things to note:
Always use the schema qualifier on the
FROM Personshould be
FROM dbo.Person-> this eliminates any confusion if you introduce new schemas in the future, and prevents the query optimizer from having to look for the default schema for your user.
For maintainability in the future, you probably want to name columns the same regardless of which table they are in. So for instance, instead of the
IDcolumn in the
Peopletable being named
ID, and it being named
Addresstable, I’d name it
PersonIDin both tables. This prevents confusion (read bugs) in joins such as
dbo.Person LEFT JOIN dbo.Address ON Person.ID = Address.Person.
Instead of naming tables like
Person, they should be named after the collection of items they contain, in plural. So,
Addresses. This eliminates confusion -> does the
Addresstable actually contain a single address or multiple addresses?
WITH (NOLOCK)should be avoided like the plague it is unless you fully understand the consequences of reading rows that have been modified by other transactions but not yet committed. From MSDN:
Transactions running at the READ UNCOMMITTED level do not issue shared locks to prevent other transactions from modifying data read by the current transaction. READ UNCOMMITTED transactions are also not blocked by exclusive locks that would prevent the current transaction from reading rows that have been modified but not committed by other transactions. When this option is set, it is possible to read uncommitted modifications, which are called dirty reads. Values in the data can be changed and rows can appear or disappear in the data set before the end of the transaction. This option has the same effect as setting NOLOCK on all tables in all SELECT statements in a transaction. This is the least restrictive of the isolation levels.
In SQL Server, you can also minimize locking contention while protecting transactions from dirty reads of uncommitted data modifications using either:
The READ COMMITTED isolation level with the READ_COMMITTED_SNAPSHOT database option set to ON.
The SNAPSHOT isolation level.
Could you use subqueries to return one record back in your joins like so?
... FROM dbo.Person LEFT JOIN (SELECT MAX(AddressID) AS AddressID, Person FROM dbo.PersonAddress GROUP BY Person) PersonAddress ON Person.ID = PersonAddress.Person LEFT JOIN dbo.[Address] ON PersonAddress.[Address] = [Address].ID
In this case, I’m just using MAX to force it to one Person, but you could use other logic to get it down it down to one record per person and eliminate duplicates that way.