Opinion: Constant churn breaks community learning for software applications

A current trend that I can't say that I love is constant churn within software applications. I have no interest to go back to the days where we got a new version of SQL Server or Power BI, etc. every few years.

It's also not a case of who moved my cheese?

In fact, I thrive on change. However, I've now become really concerned about how anyone:

  • Learns to use a complex application
  • Remembers how to use a complex application when they don't use it daily

I first really struck this issue with Azure. If I was teaching a class that used Azure, I could check every single lab on Sunday night, then Monday morning, the students would find it had all changed. That's OK for an experienced person, but not OK for a learner.

I love the rate of change for Power BI. We're endlessly getting wonderful new things. But I have to say that every class that I teach on this is like a new experience. I've got another one this coming Tuesday. I used to look forward to them but now I have a major hesitation every time, as I wonder what parts of the labs will have broken.

This is now an ongoing challenge for all this type of software though. I helped create some labs for VSTS very recently, and when I look at the product now, it barely resembles the one that I built the labs on.

Is it better? Probably yes.

But even though it might have been a few months ago, it feels like just the other week, and yet, not only has the UI changed, entire concepts have been added or removed, and the order that things are done in has changed substantially.

I don't know the answer to this but the current rate of churn is a substantial issue.

I gather the plan with the DevOps guys is to put a set of labs on GitHub, and let people who are doing the labs point out the issues day by day as they strike them. Again, for experienced users that might work. But for newcomers, I really wonder if that's what they'll think.

Will they realize the app must have changed, and it's all different, or will they just think the product is too hard to use. Either way, they'll be very frustrated.

Image by JeShoots

And while initial learning the product is one thing, I'm worried about it longer-term. A product like VSTS lets you set up automation and you hope you won't need to change it constantly. But if every time you go to make a change, you're struggling to use it like you're a newbie again, that's a problem.

Finally, I'm really concerned about ongoing support.

The vast majority of support of software applications today happens from community resources like blogs, webcasts, etc.

Will they continue to be created at the same pace if the authors know they'll be irrelevant or wrong within a very short time? How will end-users learn to do things when none of the online examples they find still work?

I wish I knew the answer to this.

SQL: Design – Entity Attribute Value Tables (Part 2) – Pros and Cons

In an earlier post, I discussed the design of EAV (Entity Attribute Value) tables, and looked at why they get used. I'd like to spend a few moments now looking at the pros and cons of these designs.

Let's use the same table as the last time as an example:

Pros

The main positive that's typically described is that the schema is "flexible". By this, the developers usually mean "I don't have to change the database schema (or worse, have someone else change it) when my needs change".

I don't buy this argument. Databases do a great job of managing metadata. Why would your app do it any better?  I'll be there's another reason (perhaps political) as to why this argument is being put forward in most cases.

Another argument is that this type of design handles sparse data gracefully. When we designed the WideWorldImporters database for Microsoft, we had a few edible products. The vast majority of the products were not edible. So the question is about where a use-by-date for an edible item should be stored. Should every product have that column?

Sparse columns in SQL Server deal with that quite well, but in SQL Server 2016, we also wanted to showcase some of the new JSON-based functionality, so we ended up putting this info into  a JSON-based column. I would have been just as happy with a sparse column for this though. AT least that would have allowed us good control over the data type.

Cons

I see many problems with these tables.

The most obvious problem is that a large number of joins is required to get all the related data for an entity if you ever need to process this within the database. Many of these applications, though, just read and write the entire entity each time and aren't concerned about this. These applications though tend to be at the small end of the market and aren't too concerned about performance.

A second key issue is data typing. In the example above, what data type is the Value column? It's most likely some sort of string. If you're one of the cool kids, it might be a sql_variant instead. In this case, let's assume it's a string of some type. How can we then provide any sort of guarantees on what's stored in the column? The LoyaltyGrade is clearly an integer in this example, but there's nothing to stop us putting Gold in that field instead.

Ah, but "the app does that" I'm told. That might be true but apps have bugs, as do ETL processes that put data in those fields. That's also very small-system thinking as most organizations will have a large number of applications that want to access this same data, and they won't even be built on the same technology stack, so lots of luck forcing all the access through your app or services layer. Ever tried to connect something like Excel to a service?

What I often then eventually see is typed columns within the table:

  • StringValue
  • IntValue
  • DateValue
  • DecimalValue

and so on. Then we have to consider what it means if a value happens to be in more than one of these. (Ah yes, the app won't allow that).

Once again though, what you're doing is building a database within the database, just to avoid adding or dropping columns.

Yet another problem is the complete lack of referential integrity. You have no way (or at least no ugly way) to ensure that values lie in a particular domain. Is it OK for Brunette to be entered rather than Brown for the hair color? If not, how do you enforce that? (I know, I know, the app does that).

These are just a few of the issues, and there are many more.

Before I finish for today, I do want to mention yet another design that I saw recently. Instead of one table to rule them all, with an EAV-based-non-design, what this database contained was a table for each data type. Yes there was a string table, an int table, a date table, etc.

It should be obvious that such a design is even nastier.

 

DevOps: Declarative is where we want to be

If you have ever tried to write scripts to configure operating systems, you'd realize just how hard that is to get correct.

For example, if you need to ensure that your virtual machine has two network adapters, and they need to have a specific set of IP addresses, how do you do that?

Image by Markus Spiske

The traditional approach for PowerShell (and similar tools) was to try to write a step-by-step script to configure the network adapters the way you want. But where do you start? Do you write a script to check for any existing adapters and loop through them to try to remove them? Do you try to add the ones you want, and then remove the others?

You'll quickly realize that you get into very, very complex coding because you might not be able to be sure what your starting point is. Everything has to stay on the path that you prescribed.

And worse, what happens if you run this script more than once?

That's the problem with imperative code. You are telling the operating system the steps required for configuration.

We don't want to be doing this.

With a declarative approach, we tell the system how we want it to end up ie: the desired state, and let it worry about how to get configured that way. This is what we want to be doing instead of writing imperative code.

PowerShell offers DSC (desired state configuration) where you describe in a JSON file, the way you want the configuration to end up. A bonus in this approach is that it's idempotent ie: no matter how many times you run it, you end up with the same outcome.

It's important wherever possible to be doing declarative configuration not imperative configuration. In later posts, I'll talk more about how.

Shortcut: Setting Environment Font in SQL Server Management Studio (SSMS)

I've been very lucky over the years because I haven't needed to wear glasses. Every now and then I've got some because I thought it might help with reading but then I find them more inconvenient than helpful. I've had one eye long-sighted and the other short-sighted. That's been a really useful thing in day to day life.

However, where this comes unstuck is on modern laptops. There seems to be a current trend to pushing more and more pixels into the same size laptop screens but the applications aren't helping to deal with that.

Image by Kevin

I'm sorry, but even a full 1080p screen (1920x1080) is crazy small on a 13 inch laptop, pretty bad on a 14 inch one, and ok on a 15 inch one.

One of the updates to Windows 8 tried to fix things by automagically scaling everything. I hated that with a passion. I'd move things from one screen to another and things would radically change size because even though the screens had the same resolution, they had different DPI (dots per inch) settings and Windows decided to auto-scale it for me. So I found the option to kill that.

In Windows 10, SQL Server Management Studio is pretty usable for me. I can set fonts or use zoom as needed. The one thing that used to be a problem on small laptops though was the text used in areas like Object Explorer.

For some versions now, you've been able to fix that. Notice that on my screen, the title bar is quite small but the Object Explorer text is much more readable:

I've found that many people don't realize that this can be changed now. In Tools>Options>Fonts and Colors, you choose Environment:

(Notice that the text in this tool window is also larger and more readable)

What you'll find though, is that you can't just directly change the size.  You need to change the font first. In this case, I chose Consolas. Then, you can set the size.

I hope that helps someone else avoid glasses for just a little longer. A bonus is that is makes presentation screens look better to the audience as well.

 

SDU Tools: Reseed Sequences in T-SQL

Prior to SQL Server 2012: IDENTITY Columns

When we needed to automatically number rows in SQL Server prior to SQL Server 2012, one option was to use IDENTITY columns.

(Apologies to purists who think these were the spawn of the devil but real people did use them).

One of the challenges with IDENTITY columns was when you wanted to move data from table to table but needed to retain the same value from the IDENTITY column. To do that, we used SET IDENTITY_INSERT <tablename> ON; That was fine but an issue that arises, is that if the next IDENTITY value was going to be 245001 and you just inserted a row with 245001, what would happen with you turned IDENTITY_INSERT back off and tried to insert another row.

The first thing to understand is that IDENTITY columns aren't by default unique. If however, you had used them as a unique value with either a UNIQUE constraint, or as a PRIMARY KEY, then it was the constraint that made them unique, not the fact that they were IDENTITY columns.

So let's assume that you really didn't want two rows with the value 245001, so how did we fix it? The tool that helped was DBCC CHECKIDENT. It allowed you to check the current identity value, but more importantly, if you had just inserted a bunch of rows, you could use it to reseed the IDENTITY value up past the last value that you inserted (or that was already there).

SQL Server 2012 and Sequences

OK, so now SQL Server 2012 came into the picture. We can use sequences instead of IDENTITY columns, and we usually prefer to do so for two reasons:

IDENTITY columns have weird behaviors and don't work everywhere. For example, you can't set IDENTITY_INSERT ON for a table on a linked server.

Sequences can be used across multiple tables. For example, if I want to have a single sequence called BookingID but used in the HotelBookings table, the CarBookings table, and the FlightBookings table, that works just fine.

One of the things that's been missing with sequences though, is the equivalent of DBCC CHECKIDENT. You can see the current sequence values by querying sys.sequences.

Well that almost works. Unfortunately it's design is broken and it returns the same value for a sequence that's never been used as for one that's been used once. We get around that by always using a sequence once immediately after creating it.

Where the fun begins though, is when you've used the next value for a sequence as a column default, and you've inserted a bunch of rows, how do you automatically point the sequence value past all the values in the table? Worse, what if the sequence was used as the default for more than one column in different tables?

One of our free SDU Tools is designed to help.  ReseedSequenceBeyondTableValues does exactly this, as you can see here:

The sequence Sequences.CustomerID was used in the WideWorldImporters database and this procedure found that the highest value that was used anywhere in the database was 1061 so it's set it to 1062 as the next value to be used.

But I need to fix all the sequences in my database!

We've got you covered here too An associated tool ReseedSequences can reseed a whole lot of sequences all at once. You can choose to do a list of them like this:

Or you can choose to do all the sequences in a database as shown in the main image above.

You can see them both in action here:

For more information and downloads for SDU Tools and our other resources, register to become one of our SDU Insiders. You'll find more info here:

http://sqldownunder.com/sdu-tools

 

 

 

Opinion: There’s a plague we need to stop

I've concluded that many software vendors (particularly large ones) don't understand how much support users of their software provide to each other, and how critical that support is.

The SQL and data communities are a good example of this. When someone has a problem and are wondering how to solve it, they don't call Microsoft or Google or Oracle (or whichever vendor) first. If they're lucky, they ask a colleague for help. But most will simply make a Google search (or yes a Bing search) to try to find an answer.

No matter how obscure an error message might be, if someone else has struggled with it before, at least there's a chance that on an online forum, someone will have spelled out what caused it for them.

Even cryptic values like app IDs in Windows that look like this:

{ADA41B3C-C6FD-4A08-8CC1-D6EFDE67BE7D}

can be matched to an error or an application that's causing the error.

Most of this happens without the vendor even being involved.

So one of my pet hates (which Microsoft have heard loud and clear on internal mailing lists) is applications that break this pattern.

Every time I have an error that says:

and nothing else, I want to scream. Even that's enough to get an answer sometimes. "Every time I click on XXX and drag the YYY, I get an error saying Oops. Something went wrong!" might lead to a posting that solves the issue but it's so much tougher when there's no other info.

A plea to developers:

At the time the error occurs, even if you don't know exactly what happened, you must know something about what you expected and what happened. Tell us something. No matter how cryptic.

Another related trend is where there is an error message but it's a GUID:

C3479B1C-B1F3-4404-8F3C-AFA00AA0FEAF

And we think: "Great. We have something to work with" only to find that the GUID changes every time the error occurs and is only meaningful to the support team at the vendor organization.

Please don't do this either.

Give us something repeatable that we can use to help each other.

SQL: Design – Entity Attribute Value Tables (Part 1) – Why?

If you've been working with databases for any length of time, you will have come across implementations of Entity-Attribute-Value (EAV) data models (or non-models as some of my friends would call them).

Instead of storing details of an entity as a standard relational table, rows are stored for each attribute.

For example, let's create a table of people:

When we query it, all is as expected:

And for a long time, this has been how we expect to create tables.

Now, if we changed the previous code to be EAV based, we could do this:

And when we query it, we'd see this:

Clearly this is going to be much harder to work with for all operations apart from reading and writing a list of values by the entity's ID.

So the first question that arises is:

Why would anyone want to have this sort of (non) design in the first place?

There are a few reasons. Some ok, some not ok.

Reason 1: Sparse Data

EAV tables first appeared for situations where there was a large number of potential attributes (ie: columns here) and each entity (ie: row here) only had a few of them. You can see a description of this in Wikipedia here: https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model

Products like SharePoint had this issue, and that's what pushed the SQL Server team to vastly increase the number of columns per table in SQL Server 2008. Prior to that version, you could have 1024 columns per table, and most people would have considered that if you needed more than that, you were doing something wrong. SQL Server 2008 increased the number of columns per table to 30,000, and also added the concept of a SPARSE column. Unlike standard columns that would always have occupied at least one bit in a NULL bitmap, SPARSE columns only occupy space when they contain data.

To avoid issues with indexes on the sparse columns, we also got filtered indexes in that version. They were highly useful for many reasons, not just for sparse columns.

The addition of SPARSE columns allowed SQL Server to use relational tables to get around the reasons that were the basis for the requirement for EAV tables.

Reason 2: Flexibility?

If we assume that we're not working with a sparse design, what's next? I often hear this type of design referred to as "flexible". When I quiz developers on what they mean by that, it basically boils down to:

I don't have to change the database schema when I add or remove attributes, or even tables.

So they have basically created a table to rule them all.

Image by Andy Omvik

This simply isn't a good reason. It might stem from either working in organizations where someone else controls the database and they're difficult to work with, or don't want to make changes quickly enough. But that's fixing the wrong problem. It can also stem from a basic lack of design in the first place.

Reason 3: Object Repository

We only want to use the database as an object repository. All logic occurs outside the database. We just want to serialize objects in and out of the database. If you have this reason, I love you. You keep me in work, dealing with organizations with intractable performance issues. Fine for a toy application, not fine for an enterprise application.

Worse, this type of design tends to lead to information silos where the data storage is specific to the one version of one application. That's not what enterprises need. They typically have many applications that use the same data, and those applications are built on more than one technology stack.

More to Come

Now there are other reasons and I'll discuss them in the next article on this topic, when we discuss the pros and cons of this type of design, but I wanted to start with the definitions and the basic "why" question.

If you have other "why" options, please let me know in the comments.

SQL: Code for errors and be pleased when they don’t occur

I spend a lot of time in large organizations that have spent an absolute fortune on highly-available systems, yet when those systems fail over (just as they were designed to do), most of the applications in the building break.

Why?

Because the developers have assumed that nothing ever breaks and have written their code in a far too optimistic manner. Did they do their jobs?

No

Is it possible for their next layer of code to deal with, say, a server disappearing for a few seconds? Of course it is. But it's not going to happen by accident. It's even more important in a cloud-based world.

There was a question about deadlocks again recently on one of our local mailing lists. Can you deal with deadlocks?

Yes

Again though, none of this is automatic. But allowing for (and perhaps even expecting) failure is one of the differences in building enterprise level code rather than toy code.

Image by Ryan Fields

Plan for failure and be pleasantly surprised when it doesn't happen often. But don't plan for perfection or you'll be disappointed.

While it is possible to handle deadlocks within T-SQL code, I prefer to catch them in the next layer of code (let's call it client code here), as there are other types of errors that should be retried at that level anyway.

Applications should have retry logic to cope with things like:

  • Deadlock (error 1205)
  • Snapshot concurrency violations (error 3960)
  • Server disconnection (can be due to network issues, fail-over of HA-based systems, etc.)
  • Various resource issues on the server

It's important to get into the habit of assuming that a transaction that you need to apply to the DB might work, rather than assuming that it will work. Always apply it via logic like:

  • While we haven't applied the transaction to the server, and while the retry time/count hasn't expired, let's try to make it happen.
  • If an error occurs, depending upon the error, we might back off for a while and try again.
  • For things like deadlocks, it's good to have some sort of exponential back-off with a random component.
  • Some errors are pointless to retry (ie: a primary key violation probably isn't ever going to work)

Once you build it that way, things become much more robust and resilient. The user should generally be unaware of these issues, apart from a slight processing delay.

Shortcut: Reset Window Layout in SQL Server Management Studio

One of the problems with applications that have highly-configurable user interfaces (UI) is that users can end up configuring them in ways they hadn't intended, and then don't know how to get back to where they were.

I remember the first time that I was at a session with a presenter from Microsoft showing the (at the time) new personalization options in ASP.NET. You could build a website and let the user determine how the site should be laid out, to suit themselves.

Overall, I can't say that I really like working with websites like that but I can understand the potential appeal. But I can easily see how end users could get really messed up.

I remember asking the presenter if there was a simple button that put the site back the way it was initially developed and removed the user's modifications, so that a user could always just get back to square one.

He told me "ah no, there isn't an option like that".

I'm glad that @sqltoolguy 's team that work on SQL Server Management Studio (SSMS) aren't part of that thinking. While SSMS is very configurable, I have seen people get really messed up with the window management in it. They ended up dragging a window when they meant to drag something else, or did another action that changed their UI and it stuck. Then they don't know how to "fix" it.

In SSMS, there's a wonderful option in the Window menu, that does just what's needed:

Reset Window Layout is the "get me back to where I was" menu item.

SDU Tools: Date of Easter Sunday

One of the endless challenges when working with databases (and not just SQL Server databases) is processing dates and times.

While there are some good date-handling functions in the T-SQL language, there are many more gaps in what's offered. This month, we've filled another one of those gaps.

In Australia, whether staff are religious or not, the Christian Easter is associated with a four-day weekend for most workers except those in retail ie: Friday (Good Friday), Saturday (Holy Saturday), Sunday (Easter Sunday), and Monday (Easter Monday). For many, it looks like this:

Image by Leio McLaren

If you are creating a date dimension in a data warehouse, you might need to know when holidays like this occur. I mentioned in an earlier post that not everyone can tell you the current rules for Leap Years (so we built a tool for that), but even those that can would typically struggle if you asked them how to work out when Easter occurs in each year.

Claus Tøndering has an amazing set of pages called his Calendar FAQ. I highly encourage you to read it. The background provided for how various calendars and dates work is quite amazing. For what we need for this tool though, Claus describes how to calculate the date for Easter Sunday here:

http://www.tondering.dk/claus/cal/easter.php#wheneasterlong

As part of our free SDU Tools for developers and DBAs, we've added an implementation of the calculations that he describes on that page, in a function called DateOfEasterSunday. (If you need to do this in other languages, Claus also has an algorithm section that might help but we just decided to implement the T-SQL from his logic instead).

You can the tool used in the main image above, and can watch it in action here:

For more details on our free SDU Tools, or to register for them, please follow this link:

http://sqldownunder.com/sdu-tools