New Feature! See how we solved hallucination detection! »

· comparisons · 5 min read

danielsgriffin

Examining HN Discovery Quality Using Existing Complaints

Comparing the search quality between HN search engines with publicly available complaints.

Comparing the search quality between HN search engines with publicly available complaints.

We’ve launched a new Hacker News search experience, focused on discovery: hn.trieve.ai (GitHub: Trieve API backend, frontend search interface).

Hacker News has long been a playground for search innovation—with the community often leaning in to explore new possibilities in search. Over the past six months, Nick has been looking back at the various search experiences and detailed his findings in a post: History of HackerNews Search: From 2007 to 2024.

We combed through HN (and user issues posted to Algolia’s repo for HN search) in search of search complaints. Over the years there have been some complaints about indexing issues, and we’re not covering those in this post. Instead, we looked for examples where people shared actual search queries. For each query, we looked for what they said or implied about their search intent and the search results they found. What have people said about search quality? What searches are not possible or not easy in the current HN search? When are folks resorting to running a site: search on a search engine like Google? Where can Trieve help make search better?

Discover well-beyond exact matches in titles

Searching for “postgres clustering”

Algolia’s engine focuses on showing extremely precise keyword matches while Trieve’s takes a more discovery-focused approach and prioritizes relevance. This query is a good example of where a relevance focused approach is ideal.

Searching for “AT&T says criminals stole phone records of ‘nearly all’ customers in new data breach”

Precise keyword sensitivity means that having one extra word in the search causes it to fail. Trieve’s relevance oriented fulltext search is less sensitive and will still surface good results (see the search with or without “new”). The issue does not seem to be related to an Algolia query-length limit (512 chars while the query is 84).

Searching with special characters

Searching for “[video]”

Algolia’s result appear to match on video, not only [video]. Trieve’s semantic search mode, and therefore also hybrid surprisingly do very well on the exact match, but not perfect. Dense vector tokenization transformer model style tokenization is shockingly good at understanding when [ are important. Both Algolia’s and Trieve’s tokenizers ignore the [ for Keyword and Fulltext search types.

Algolia for [video]
Algolia
Trieve (semantic) for [video]
Trieve (semantic)
Algolia (quoted) for [video]
Algolia (quoted)
Trieve (semantic, quoted) for [video]
Trieve (semantic, quoted)

Searching for “AT&T”

The issue seems to be with the prefix paramater in Algolia. Algolia’s results are less relevant when searching for AT&T with prefix=true. Results improve with prefix=false or when the query is in quotes. This appears to be an issue with how the prefix parameter is set in the URL in Algolia. If the user opens the Algolia HN search and starts typing, the URL params show prefix=true. If the user hits enter, the URL is changed to prefix=false. Trieve’s results (for all search types) are relevant even without quotes. There may be reasoning for Algolia’s default prefixing params outside of this query which make it better overall, but we are not aware of them.

Algolia (prefix=true) for AT&T
Algolia (prefix=true)
Algolia (prefix=false) for AT&T
Algolia (prefix=false)
Algolia (quoted) for AT&T
Algolia (quoted)
Trieve for AT&T
Trieve

Out-of-domain strings

Searching for “lootitooti”

This was actually a complaint about not Algolia, but a demo search tool from another provider. The user praised Algolia’s performance on this query. Trieve initially struggled with it because our engine splits “lootitooti” into multiple tokens like “loot”, “lootit”, etc. instead of preserving the entire match. We solved the issue and improved precision by using Trieve’s internal BKTree to detect non-dictionary tokens and auto-quote require them. Trieve is transforming a query of lootitooti into “lootitooti” and does similar things for entity names like “OpenAI” or “ChatGPT”.

Algolia for lootitooti
Algolia
Trieve for lootitooti
Trieve for

Presque vue searches

Searching for “deterministic Docker builds”

Algolia shows no results for a story search for this query. Trieve shows the identified target link in the fourth result. This is a common discovery pattern we hope to support—the presque vue search or the “tip of your tongue” phenomenon.

Algolia (type: Story) for deterministic Docker builds
Algolia (type: Story)
Trieve (type: Story) for deterministic Docker builds
Trieve (type: Story)

Searching for “tip of your tongue phenomenon”

Bonus! Again, precision focused approach of requiring “your” has downsides.

Filter on author with a hyphenated username

Searching for ""It Won’t Fail Because of Me” by:1970-01-01”

Algolia does not show any results when the query includes by: filter for searching a username containing hyphens (example shown: user @1970-01-01).

Default sorting by relevance v. popularity metrics

Searching for “Excel”

This is from a comparison that @airstrike shared after we launched our discovery search. He prefered the results from Algolia. Algolia defaults to a popularity-based sort. Algolia also has sort-by-date, but does not have a specific relevance-focused sorting option.

Trieve offers multiple sorting options:

  • default: relevance (not tuned to extrinsic popularity metrics)
  • number of points (similar to Algolia’s “popularity”)
  • date (reverse chronological)
  • descendants (number of comments)

Nick (@skeptrune) responded with some of our internal deliberations:

We went back and forth on making points sorting default and ended up deciding against it, but maybe we should have. Our thinking was that since it’s focused on “discovery” it was worth prioritizing relevance, but I can see how it can feel the result quality isn’t as great.

If someone is looking for more of the popularity-focused results, they can start their Trieve HN Discovery searches with the sortby= parameter set to num_value (try this link).

Algolia (sorted by popularity) for Excel
Algolia (sorted by popularity)
Trieve (sorted by relevance) for Excel
Trieve (sorted by relevance)
Trieve (sorted by points) for Excel
Trieve (sorted by points)
Trieve (sorted by descendants) for Excel
Trieve (sorted by descendants)
Back to Blog

Related Posts

View All Posts »

Trieve vs. Algolia in 5 Minutes

Compared to Algolia, Trieve offers vector based search, manged RAG, and pricing based on rate limits instead of number of search queries

PGVector's Missing Features

PGVector offers infrastructure simplicity at the cost of missing some key features desireable in search solutions. We explain what those are in this blog.