Opinion: Case Sensitivity is a Pox on Computing

Case sensitivity in comparisons is an aspect of computing that I’m surprised is still so widespread. I can’t say it more clearly than this:

It’s a pox on computing and needs to be eradicated.

(https://www.phrases.org.uk/bulletin_board/50/messages/583.html)

I’ve recently been working at a site at present where a new case-sensitive SQL Server system is being implemented. I cannot begin to describe what a poor idea I think this is.

In the end, all that a case sensitive system allows you to do is to have:

  • Multiple identifiers exist
  • They exist in the same scope
  • The names of the identifiers differ only by case

You’d have a hard time convincing me that that would ever be a good idea.

At least not for a system used by typical humans. No sensible person is ever going to be comfortable with “John Smith” being a different name to “john smith”. And similarly, do you really want a single database table with a CustomerID column, a CustomerId column, and a customerID column?

Well I certainly don’t.

In the new system that I mentioned, there are columns ending in ID, Id, etc. They haven’t even been consistent in the naming of their case-sensitive objects.

And yes, I hear the “but what about private variables vs properties in languages like C#?” complaint:

  • Age is the object’s property
  • age is where the object stores the value assigned to the property

Surely we can come up with a better naming convention than that. I’ve lost count of how many times I’ve seen people using the property when they meant the private variable or vice versa. It’s just not sensible.

Now before I hear complaints that case matters, be clear that I’m not talking about case preservation; that’s an entirely different thing. Yes, if I defined a column as CustomerName, I don’t care if I query it by customername, Customername, etc. I want it coming back as CustomerName ie: however I defined it. Case preservation is a virtue; it’s case sensitivity that I see as an almost always painful and unnecessary thing.

Worse, if you’ve ever tested an application against a case-sensitive server, you’ll understand the challenges involved. It’s hard to get case-sensitive code correct.

I gather that SQL Server Management Studio has a current bug that arises when you remove and re-add a database from an availability group on a case-sensitive server. Why? It appears that they aliased a table with A in one place, and used the alias as a in another place. It’s really nonsense to have a situation where that matters but it highlights the other big issue. It makes for fragile applications.

Image by Michał Parzuchowski

Do you really want to be the one who’s testing all your applications and 3rd party utilities to find out if they’ve tested case-sensitivity properly? Do the tools that you use have a better testing regime than SSMS? I’ll bet that most don’t. And what that means is that you get to spend your life wading through obscure tooling issues.

No thanks. Life is too short.

SQL: Do I Still Need to Run DBCC CHECKDB?

In short: YES

(In contradiction to Betteridge's Law of Headlines)

Every now and then, customers ask me if they really need to run DBCC CHECKDB. There was even a question that came up about this on a private mailing list full of people who really should already understand why. There is no mystery here.

DBCC CHECKDB checks the database for physical readability (are the pages intact and can they be read from the I/O system). This makes people wonder that if the I/O subsystem is already doing this, why does SQL Server need to do this?

Importantly, DBCC CHECKDB checks for logical consistency as well.

That’s incredibly important.

SQL Server is a fabulous database but it has no way of coping with internal structures that are messed up. For example, links on pages to other pages, but now pointing to the wrong pages.

Fortunately, because SQL Server is such a good database, having to deal with logical corruption caused by the database engine messing up is a truly rare event. That could happen though if there was a serious bug in SQL Server.

But even if SQL Server is perfect, it could be led astray. A good example would be a multi-path I/O system. Imagine this scenario:

  • The database has two paths to the I/O subsystem (multi-path I/O)
  • It writes a page out one path
  • It reads the same page from the other path
  • The I/O subsystem returns an older version of the page from its cache, not the one that was just written

That sort of thing could easily lead to internal corruption of the database. So could issues with your memory subsystem.

Fortunately, the SQL Server team again have your back. It is designed to help to detect some of these issues too. It probably can’t fix them but it might help to find them. For example, in the multi-path I/O problem mentioned above, SQL Server keeps a list of checksums of recently-written pages. If it reads one of those pages and it gets a page with a valid checksum but a different checksum to the one that it recently wrote, it lets you know that something went very wrong.

Even cleverness like this though isn’t going to help if the culprit is the storage subsystem having a cache bug, and so on, and so on.

Finally, DBCC CHECKDB is so important because it lets you find issues quickly. Otherwise, the logical problem would get much, much worse and be difficult, if not impossible to correct. You don’t want to be the DBA who has a large number of backups but every one of them has the same corrupted data.

DevOps: Infrastructure as Code – What about code quality and management?

For many years now, it has been important to script the configuration and deployment of systems, particularly virtual machines. Infrastructure as Code is now a common requirement but as the required configuration has become more complex, scripting in language like PowerShell has become more difficult.

It’s all very well to write code to add say a network adapter, but how do you check the current state of the machine?

  • Did that adapter already exist?
  • Is something else using the IP address?
  • How do you write a script to a cater for all the situations?

This leads to ever-more complex code and this is where my concerns start. Writing code for creating infrastructure needs the same discipline that writing any other code does. This includes code quality, coding conventions, error handling, source code control and versioning. Yet, who is writing this code?

Image by Wes Hicks

What I commonly see is that someone who’s come from a network admin background or similar, and who was able to hack together a few batch jobs successfully, suddenly is creating complex script for infrastructure creation. It often reminds me of cable infrastructure in 3rd world countries that sort-of gets the job done, most of the time.

And that’s just more than a bit scary for organizations that depend upon that code.

Image by Janko Ferlič

Regularly I see PowerShell scripts that wouldn’t pass newbie developer 101 tests. They are potentially highly unreliable. I think that organizations need to become much more involved in the management of this type of code, and start to exert appropriate controls on how it’s developed.

Organizations that depend on code the way it's often currently written, are likely to find things going very wrong, just when they can least afford them to do so.

Image by Sergi Viladesau

One shining light in this area is the appearance of better and better configuration management tools.. Rather than telling the operating system how to configure something step by step (ie: imperatively), you create a template of how you’d like the machine to be configured (ie: declaratively) and tell the system to “make itself look like that”.

For PowerShell, that’s Desired State Configuration (DSC).

DSC takes a JSON-based template that describes the outcome, not the process to achieve the outcome. And a bonus is that because it’s outcome driven, it’s also idempotent ie: you can run it again and again and still get the same outcome.

Image from MSDN

If you haven’t explored DSC, I’d encourage you to do so. Either way, it’s time to start to take control over the management of your infrastructure-related code.

If you'd like to learn more about Infrastructure as Code in general, Microsoft have published this course as part of the Microsoft Professional Program for DevOps. You can take the course for free (more details here: https://www.edx.org/course/infrastructure-code-microsoft-devops200-2x-0) , or pay to take it if you want to pursue the certification.

Shortcut: Using the Clipboard Ring in SSMS

Two key combinations used by SQL Server T-SQL developers every day are Ctrl-C and Ctrl-V for copy and paste.

But many users of SQL Server Management Studio (SSMS) don’t realize that it has a clipboard ring and can deal with several objects in the clipboard at the same time.

Let’s see an example.

In this screen shot, I’ve opened a query window with the source code of the AnalyzeTableColumns procedure from SDU_Tools.

I might want to work with the parameters for that procedure, so I double-click and hit Ctrl-C for each of the parameter names.

I’ve then opened a new query window where I want to work. If I hit Ctrl-V, I just get the last value that I copied, as expected:

However, instead of using Ctrl-V, if I use Ctrl-Shift-V, I see the same value, but if I continue to hit Ctrl-Shift-V, I see the previous clipboard entries, one at a time. I can then use one or more of the other parameter values that I copied before:

This is one of those things that once you get used to it, you’ll wonder how you ever worked without it.

 

 

 

 

SDU Tools: Find columns that shouldn’t have time

Ever since I started working with SQL Server back in 1992, the #1 requested feature that I kept hearing about was a separate date data type. SQL Server had a smalldatetime and a datetime but these included both date and time within the same type.

Having date and time often led to odd bugs or performance issues, where people didn't realize that times were included and tried to work with the values as dates.

In SQL Server 2005, the date data type was meant to be the poster-child for the new SQL CLR integration and there was a shiny new date data type built using it. But when we (the MVPs) looked at it, we all disliked it immensely. The main problem was that it didn't work with all the existing date-related functions. So the date data type was removed from the product before it shipped.

In SQL Server 2008, we got new intrinsic (standard built-in) data types for date and time (and a few interesting other ones), and they worked with the date-related functions. This was what was needed.

However, there's a lot of code out there that was built before this data type existed, or was simply built with the wrong data type.

As part of our free SDU Tools collection, we have provided two ways to find columns that might be worth changing:

ListPotentialDateColumns looks at the names of the columns. If the names suggest that it's a date, but the data type is one that includes time, it'll flag the column as a potential. You can see it being used here:

In each case, the column name suggests that it's a date, but the data type says otherwise. That runs quite quickly as it's just looking at the column names and data types.

Harder to find are columns that are declared with a datetime, smalldatetime, datetime2, etc. yet don't contain any time values. To make these earlier to locate, we've built another tool.

ListPotentialDateColumnsByValue looks through all the values in a column. If the column only contains dates, it reports it:

Now that's obviously a bit slower as it has too look through all the data. After running this (for example), we know that the ValidFrom column in Sales.BuyingGroups only contains dates, without any time component. It's likely that it should have been a date column instead.

You can see both of these in action here:

You'll find more information on our free SDU Tools here:

http://sqldownunder.com/sdu-tools

 

Opinion: Avoid Unneces Abbrevs

Many database developers (and other developers) seem to regard the endless use of abbreviations as some badge of honor. Don't be one of these people.

Avoid abbreviations almost all the time.

I've written before about my dislike for the EOMONTH T-SQL statement. Given the same version introduced names like DATETIMEOFFSETFROMPARTS, surely we didn't have to save 3 characters and could have had ENDOFMONTH. (I heard it was named this way to match the Excel function but matching something from another language that was created a long time ago isn't the right answer here).

I'd also note that I'd rather have seen it as END_OF_MONTH. It's a real pity that T-SQL has names like DATETIMEFROMPARTS but also names like NODE_FROM_PARTS all introduced in recent years. Again, I think there's a need for more cranky old dudes in the development team.

Now we can't change the language directly, but we can make better decisions when using the language.

For example, when using the DATEADD function, we have two choices for parameters. One is a set of (sometimes fairly obscure) abbreviated values. The other is a set of actual English words.

Please choose to use words not abbreviations.

Don't write this code like this:

DATEADD(d, 1, @StartDate)

DATEADD(dd, 1, @StartDate)

DATEADD(y, 1, @StartDate)  

DATEADD(ww, 12, @StartDate)

Are you confident about what each of those means? Instead write this:

DATEADD(day, 1, @StartDate)

DATEADD(day, 1, @StartDate)

DATEADD(dayofyear, 1, @StartDate)

DATEADD(week, 12, @StartDate)

The same applies to column names. Don't write Mbr when you could have written Member.

 

SQL: Newbie Mistake #1: Using float instead of decimal

When I’m looking at a database schema for the first time, there are a number of tell-tale signs that give me the hint that the developers really haven’t done much work with SQL Server before. They’ve made a “newbie” mistake.

One of those is the extensive use of the float data type.

Most times that I see this, the developers have come from a C or Java background and they assume that something that needs a decimal point in it, needs to be float. There are some situations where float makes sense, but 99% of the time what they should have used was decimal.

float is used to store approximate values, not exact values. It has a precision from 1 to 53 digits.

real is similar but is an IEEE standard floating point value, equivalent to float(24).

Neither should be used for storing monetary values. Because the values cannot be stored precisely, people who use float end up with values that don’t match, columns of values that don’t quite add up, or totals that are a cent out, etc. They spend their lives trying to round values to fix the issue, and usually don’t get it right.

Image by Olga DeLawrence

Here’s an example. What values should this code print?

You’d expect the values 0.0, 0.1, 0.2 and so on up to 10.0. But that’s not what you get. The query would continue until the maximum value for the data type was exceeded (a long time). If you stop the query, you’ll see odd values:

Worse, note that our stop value of 10 is actually shown, but it didn’t stop:

The problem is that while the value 10 can be stored accurately in float, the value 0.1 can’t be. In decimal, we have recurring fractions. 1/3 is 0.33333 recurring. We can’t write it precisely in decimal. In binary though, 0.1 has the same issue.

So even though we had a test of WHILE @Value <> 10.0, the value never exactly equalled 10.0. So why does it show 10 in the Messages tab? That’s because SQL Server Management Studio (SSMS) rounds the values that it prints. It’s not showing us the actual value.

We could fix this by substracting @Value from 10 and taking the absolute value of the result, then comparing it to a small increment. But who wants to write code like that?

In SQL Server, decimal, numeric, money, and smallmoney are the data types with decimal places that store values precisely. numeric is basically a synonym for decimal. money and smallmoney are old Sybase data types that have fixed scale, and have a funky relationship with currency symbols when converting strings. I generally don’t use those. (There are some arguments for them in gigantic data warehouses where their smaller storage size might help but with row compression, the reasons for that are quickly disappearing). It’s one of the problems with backwards compatibility in SQL Server.

You can’t blame people for using a data type called money for storing amounts of money. But it’s generally not the right answer.

Let’s now look at the query from before if we change to decimal:

When executed, it stops exactly as expected:

Decimal (and numeric) require a precision and a scale. These should be chosen appropriately to store the values that you need. You need to keep rounding in mind when you calculate decimal values.

As I mentioned earlier, there are places where float and/or real make sense, but they are typically scientific calculations, not business calculations.

 

 

 

DevOps: Load Tests Need to be Part of Your Regular Deployments

One of the principles of DevOps is that you should be able to deploy regularly, with smaller low-risk changes. This addresses a key problem that I see in many sites:

  • Each code release is large
  • The code is difficult to test
  • Code merges are painful
  • Deployments take a long time
  • Deployments often fail
  • Deployments are considered to be high-risk procedures

The end result of this situation is that deployments are avoided and then a snowball effect occurs where:

  • The code is even larger
  • Code merges are even more painful
  • The deployments take even longer
  • The deployments are even more likely to fail
  • Deployments are considered to be even higher-risk procedures

It’s critical to work out how to break this cycle of pain. Smaller changes that are deployed more regularly is often the desired outcome.

But while regular deployments are helpful, so many teams leave performance (and specifically load testing) until way too late.

One of the challenges of my work is that my first involvement with many of these sites is when things are already in a tragic state and the regrets and blaming start:

Image by Tom Pumford

The worst I’ve ever seen was a start-up team that was building a new software as a service app. They needed 1000 concurrent users to have a viable business. 5000 concurrent users would be a wonderful business. But after building it for 4 years, and with less than 2 weeks before they needed to show it to the shareholders, they couldn’t get it past 9 concurrent users.

Another team was building an appointment application for a government department. They had built the entire logic on layers and layers of untestable triggers, and some triggers performed a very large number of actions. They could never get the application out of User Acceptance Testing (UAT). What was disappointing is that at that stage (long after it should have already been delivered to the customer), all they were hoping for was to find the minimal enhancement that would just get them over the line for delivery.

That’s really sad for the customer involved. If they don’t accept the app, they have a business issue. If they do accept it, they’ve bought a lemon.

Image by Charisse Kenion

Ongoing, they’re going to hate that app.

Load testing needs to be part of the routine development cycle of any app, and not just left to the end. Even if you haven’t made large changes to the app that you’re building, no doubt you have dependencies on other code or services, and any of those dependencies could have regressed.

 

Shortcut: Using Snippets in SSMS to Improve the Drop Database Statement

In an earlier post, I showed how to create a DROP DATABASE template in SQL Server Management Studio (SSMS). At the time, I mentioned that a template wasn't the best option because a command like this is normally inserted into a script; it's not the whole script.

That's where snippets shine. Let's create a snippet for it.

First let's open Code Snippets Manager (Tools > Code Snippets Manager):

You'll see the existing snippet folders. I've clicked Add, then created a new folder called GL_Snippets.

Next I've created a file called DropDatabase.snippet on my desktop folder:

I then opened that file using SSMS:

Note that SSMS has a perfectly good XML editor. I've then used it to create the snippet file and saved it:

Back in Code Snippets Manager, I've clicked Import:

I located the file to import, deselected the My Code Snippets folder, and selected the GL_Snippets folder, then clicked Finish:

Now let's try the new snippet. I opened a new query window, and right-clicked in the empty space. We can then see the option for Insert Snippet (note we could have used Ctrl+K, Ctrl+X):

Then select our new DropDatabase snippet:

The snippet will then appear, with the literal parameter highlighted, ready for replacement:

I typed SomeDatabase and then clicked elsewhere, and it's all magically updated:

I hope you find this really useful for any code that you find yourself typing all the time.

SDU Tools: Listing all the Data Types Used in a SQL Server Database

 

When I first start to work with a database that I haven't seen before, one of the things that I check immediately is the type of data being stored. The list of existing data types often gives me interesting insights into the design of the database, and how it interacts with application code.

The list of data types also shows me details of the consistency (or lack of) in the design.

In the image above, you can see the data types that I used when creating the WideWorldImporters database samples for Microsoft. This listing was created by our ListAllDataTypesInUse stored procedure in our free SDU Tools suite.

For another example of it's use, let's take a look at the AdventureWorks sample database, and see what it shows us:

The parameters to the procedure are the name of the database to check, and a list of schemas, tables, and/or columns to limit the search to. For each one, you can provide the word ALL, or a comma-delimited list of values. For example, @SchemasToList = 'HumanResources,Sales' would limit the output to tables in the HumanResources and Sales schemas.

In this case, you can already see a few things. First is that user-defined data types have been used (note AccountNumber in row 1, Flag in rows 9, Name in row 14, and NameStyle in row 15).

Decimal values have a precision of both 8 and 9, and scales of 2 and 4. That's often interesting to see how decimal values are stored, and how rounding is being used in the application.

I'm not a fan of the money data type (it's an old Sybase type), but note that it is used, as is numeric, along with the decimal values. That would have me  a bit curious.

Obviously, I'd be more concerned if I found text, ntext, or image data types, as they are deprecated.

I tend to also rarely use nchar, so I'd be a bit interested in what those values are for.

This part of the list let's me get an idea how string values are stored. It also lets me start to find issues like email addresses or product descriptions that are stored in different size columns in different parts of the database.

And finally I get to notice some more user-defined types, plus smallmoney (likely a red flag for me), some GUID values, and some xml.

You can see it in action here:

For more information on our free SDU Tools, please visit:

http://sqldownunder.com/sdu-tools