Work

  • I did some investigation into how we boost popular pages in search results. Strangely, the formula is 1 / (10 + rank), so the most popular page gets a score of 1 / 11, the second most popular page gets a score of 1 / 12, and so on. How much more popular page n is than page (n+1) doesn’t play into the scoring at all.

    We have some indication that popularity boosting isn’t quite doing what we want with ES6. The range of scores ES6 assigns to results is so different to the range of scores ES5 assigns that it’s not too surprising that these popularity boosts might need tweaking.

  • I started prototyping a way to expose the ES ranking evaluation API in search-api, so we can give search-api a list of relevance judgements and have it say how good the search performs. This is far preferable to copying the query out of search-api and querying Elasticsearch directly.

    However, a danger with using the ranking evaluation API to guide improvements in our search query is that we’re optimising for results we know are relevant, and so might unknowingly penalise results we don’t know are relevant. If we make our relevance judgements based on the first page of results, there might be a super-relevant result on page 2, but that doesn’t make it into the data we use for scoring queries. The same sort of problem shows up in the machine learning community, and a common approach there is to have separate training and testing data, so that might be something to look at.

  • Some more query tweaking (which hasn’t gone live yet, I need to think about how best to test it):

    • I decided to try to replicate the old scoring behaviour of should queries in ES5 and below, and came up with this delightful bit of code:

      def should_coord_query(queries)
        if queries.size == 1
          queries.first
        else
          {
            function_score: {
              query: { bool: { should: queries } },
              score_mode: "sum",
              boost_mode: "multiply",
              functions: queries.map do |q|
                {
                  filter: q,
                  weight: 1.0 / queries.size
                }
              end
            }
          }
        end
      end
      

      Good news is that it works, and does seem to improve matters quite a bit. Bad news is that it checks every result against each query twice.

    • I also spotted a “problem” in our search query. Here are the subqueries we use:

      match_phrase("title", PHRASE_MATCH_TITLE_BOOST)
      match_phrase("acronym", PHRASE_MATCH_ACRONYM_BOOST)
      match_phrase("description", PHRASE_MATCH_DESCRIPTION_BOOST)
      match_phrase("indexable_content", PHRASE_MATCH_INDEXABLE_CONTENT_BOOST)
      match_all_terms(%w(title acronym description indexable_content), MATCH_ALL_MULTI_BOOST)
      match_any_terms(%w(title acronym description indexable_content), MATCH_ANY_MULTI_BOOST)
      minimum_should_match("all_searchable_text", MATCH_MINIMUM_BOOST)
      

      We do phrase matching against some fields, and then a fuzzier “just look for all the keywords” match against a collection of fields. Seems reasonable, right?

      Actually not really. Phrase matching is very strict. For example, “brexit business” won’t match_phrase the title of the business readiness finder, which is “Find Brexit guidance for your business”, because there are words between “brexit” and “business”. So really, those four match_phrase subqueries will almost never match, so we miss out on the field-specific boost factors.

      I’ve changed those into match_all_terms and also tweaked the boost factors, and it looks much better.

    I’d like to put both of these changes out and A/B test them against the current search query, because all the searches I’ve manually done have looked pretty good. Maybe that will happen this coming week.

Miscellaneous

  • I sorted out a short-term tenancy agreement with my current landlord, so I won’t need to move until mid-December. This is good, as I no longer have to rush to find somewhere.