SQL Server Tricks and Tips from a Self-Proclaimed Data Nerd
Using COLLATE to FIND differences in field value Case:
So.... How many times have you run into an issue where you have Customer-entered values in the database (or maybe even Staff-entered. Shhh. Don't tell anyone.) and for whatever reason, the values entered into the field in question have the wrong case?
Or perhaps you're running into a slightly more critical issue where the system actually behaves differently if a value being looked for is capitalized versus not.
Regardless of what's happening or why, if data of this nature needs fixing in the database, the proximate question posed to the "fixer" is going to be, "How to do I actually FIND the values in need of fixing?"
One means of doing this (and please note, there are many, many possible roads to take here) is using the COLLATE function.
For example, a standard (and very, very basic) SQL query to look at your existing User Status values would look like this:
SELECT DISTINCT Status
FROM Users
The results of this query will look something like this:
facstaff
grad
GRADUATE
misc
None
STAFF
test
udg
UNDERGRADUATE
But the problem here is that, to SQL Server, unless specifically looked for, case is ignored.
So the value "staff" and the other value "STAFF" (and for that matter, even "Staff") will be seen as identical by the above query, and, in the case above, it displays in the query result set as "STAFF".
But are all of the Users table Staff values being returned there actually in all caps? We have no way of knowing unless we query specifically for that case.
We can do that using the COLLATE function. So if we use this slightly more complicated query instead of the original:
SELECT DISTINCT Status COLLATE sql_latin1_general_cp1_cs_as
FROM Users
, we instead get results that look like this:
Staff
STAFF
grad
None
facstaff
misc
test
udg
Graduate
GRADUATE
UNDERGRADUATE
Notice that "staff" now appears in the list twice, first as "Staff", and then again as "STAFF". That tells us that we have two different case variations of that value in the table. And please note: Had "staff" or "staFF" also existed in there, those two case-specific values would also be listed in the results set.
We also see that "graduate" exists in the list twice, once as "Graduate" and then again as "GRADUATE".
So, problems found. We now know what might potentially be in need of fixing. How do we then fix them?
Well, to paraphrase a favorite cooking show host, that's another article, but hopefully this will give you an introduction of sorts to the COLLATE function.
Until next time.....
- Kevin
"Apologies, Captain. I seem to have reached an odd functional impasse. I am, uh ... stuck."
--DATA, Star Trek: The Next Generation, "The Last Outpost"
ADDITIONAL RESOURCES:
https://docs.microsoft.com/en-us/sql/t-sql/statements/collations
http://stackoverflow.com/questions/5039211/what-does-collate-sql-latin1-general-cp1-ci-as-do
Comments
Thanks! This is very helpful. Will try it myself. Our statuses are kind of a mess and I'd love to clean them up, but the only way to do so is manually. Looking forward to your next post!
Please sign in to leave a comment.