Quantcast
Channel: Udi Dahan - The Software Simplist » CQRS
Viewing all articles
Browse latest Browse all 17

Queries, Patterns, and Search – food for thought

$
0
0

fishWith all the talk of CQRS, the area that doesn’t get enough treatment (in my opinion) is that of queries. Many are already beginning to understand the importance of task-based UIs and how that aligns to the underlying commands being sent, validated, and processed in the system as well as the benefits of messaging-centric infrastructure (like NServiceBus) for handling those commands reliably. When it comes to queries, though, it isn’t nearly as well understood what it means for a query to be “task based”.

Starting with CRUD

Let’s start with a traditional CRUD application and work our way out from there.

In these environments, we often see users asking us to build “excel-like” screens that allow them to view a set of data as well as sort, filter, and group that data along various axes. While we might not get this requirement right away, after some time users begin to ask us to allow them to “save” a certain “query” that they have set up, providing it some kind of name.

That, right there, is a task-based query and it is the beginning of deeper domain insight.

Pattern matching

Any time a user is repeatedly running the same query (this can be once a day or some other unit of time) there is some scenario that the business is trying to identify and is using that user as a pattern-matching engine to see if the data indicates that that scenario has occurred.

It’s quite common for us to get a requirement to add some field (often a boolean or enum) to an entity which defaults to some value and then see that same field used in filtering other queries. These measures are sometimes instituted as a temporary stop-gap while a larger feature is being implemented, though (as the saying goes) there is nothing more permanent than a temporary solution.

Where we developers go wrong

The thing is, many developers don’t notice these sorts of things happening because we don’t actually look at the kinds of queries users are running.

One excellent technique to better understand a domain is to sit down with your users while they’re working and ask them, “what made you run that query just now?”, “why that specific set of filters?”.

What I’ve noticed over the years is that our users find very creative ways to achieve their business objectives despite the limitations of the system that they’re working with. We developers ultimately see these as requirements, but they are better interpreted as workarounds.

I’ll talk some more about how a software development organization should deal with these workarounds in a future post, but I want to focus back in on the queries for now.

Oh, and don’t get me started on caching or NoSQL, not that I think that those tools don’t provide value – they do, but they’re only relevant once you know which business problem you’re solving and why.

Not all queries are created equal

Even before bringing up the questions I described in the previous section, any time you get query-centric requirements the first question to ask is “how often will the user be running this specific query?”.

If the answer is that the specific query will be run periodically (every day, week, etc), then drill deeper to see what pattern the user will be looking for in the data. If the person you’re talking to doesn’t know to answer that question, then go find someone who does. Every periodic query I’ve seen has some pattern behind it – and in my conversations with thousands of other developers over the years, I’ve seen that this is not just my personal experience.

But there is a case where a query does get run repeatedly without there being a pattern behind it.

I know this sounds like I’m contradicting myself, but the distinction is the word “specific” that I emphasized above.

There are certain users who behave very differently from other users – these users are often doing what I call research, i.e. the “I don’t know what I’m looking for but I’ll know it when I see it” people.

These researchers tend to repeatedly query the data in the system however they tend to run different queries all the time. This is the reason why traditional data warehouse type solutions don’t tend to work well for them. Data warehouses are optimized for running specific queries repeatedly.

Keeping the Single-Responsibility Principle in mind – we should not try to create a single query mechanism that will address these two very different and independently evolving needs.

And now on to Search

Search is a feature that is needed in many systems and whose complexity is greatly underestimated.

While the developer community has taken some decent strides in understanding that search needs to be treated differently from other queries, the common Lucene/Solr solutions that are applied are often overwhelmed by the size of the data set on which the business operates.

The problem is compounded by our user population being spoiled by Google – that simple little text box and voila, exactly what you’re looking for magically appears instantaneously. They don’t understand (or care) how much engineering effort went into making that “just work”.

Lucene and Solr work well when your data set isn’t too large, and then they become pretty useless as the quality of their results degrades. The thing is that many of us in IT tend to work on projects where we have an unrealistically small data set that we use to test the system and, at these volumes, it looks like our solutions work great. But if you have 20 million customers, do you think a full text search on “Smith” is going to find just the right one?

Larger data sets require a relevance engine – something that feeds off of what users do AFTER the query to influence the results of future queries. Did the user page to the next screen? That needs to be fed back in. Did they click on one of the results? That needs to be fed back in too. Did they go back to the search and do another similar search right after looking at a result – that should possibly undo the previous feedback.

And that’s just relevance for beginners.

You know what makes Google, you know, Google? It’s that they have this absolutely massive data set of what users do after the query that informs which results they return when. You probably don’t have that. That and search is/was their main business for many years – I’m betting that it’s not your main business.

You should discuss this with your stakeholders the next time they ask for search functionality in your system.

In closing

I know that the common CQRS talking points tell you to keep your queries simple, but that doesn’t mean that simple is easy.

It takes a fair bit of domain understanding to figure out what the queries in the system are supposed to be – what tasks users are trying to achieve through these queries. And even when you do reach this understanding, convincing various business stakeholders to change the design of the UI to reflect these insights is far from easy.

It often seems like the reasonable solution to give our users everything, to not limit them in any way, and then they’ll be able to do anything. What ends up happening is that our users end up drowning in a sea of data, unable to see the forest for the trees, ultimately resulting in the company not noticing important trends quickly enough (or at all) and therefore making poor business decisions.

Even if your company doesn’t believe itself to be in “Big Data” territory, I’d suggest talking with the people on the “front lines” just in case. Many of them will report feeling overwhelmed by the quantity of stuff (to use the correct scientific term) they need to deal with.

It’s not about Lucene, Solr, OData, SSRS, or any other technology.

It’s on you. Go get ’em.


Viewing all articles
Browse latest Browse all 17

Trending Articles