The Surprising Impact Of Medium-Size Texts On PostgreSQL Performance
tl;dr: Haki’s article delves into the intricacies of text field sizes and their impact on PostgreSQL query performance. He classifies text fields into "small", "medium", and "large", highlighting the unexpected performance implications of medium-sized texts. Through the lens of PostgreSQL's TOAST mechanism, which compresses and/or breaks up large field values, Haki demonstrates that medium texts can sometimes lead to slower queries than even larger texts. "The main problem with medium-size texts is that they make the rows very wide," affecting performance due to increased IO.
featured in #453
Some Useful, Non-Obvious Postgres Patterns
tl;dr: Several PostgreSQL patterns that might not be immediately obvious to many developers including: (1) Always define explicit ON DELETE semantics. (2) If in doubt, use ON DELETE SET NULL. (3) Mutually exclusive columns. (4) Prohibit hidden nulls in jsonb columns. (5) Declare your updated\_at columns NOT NULL to make sorting easier.
featured in #446
The Unexpected Find That Freed 20GB Of Unused Index Space
tl;dr: Haki’s team managed to free up more than 70GB of database storage without dropping any indexes or deleting data. They initially used conventional techniques like rebuilding indexes and tables to clear up space. However, a surprising discovery allowed them to free an additional ~20GB. They realized that PostgreSQL indexes NULL values, which led them to create a partial index that excludes these NULL values, thereby significantly reducing the index size. The article also delves into the concept of "bloat" in PostgreSQL tables and indexes, offering solutions like using the REINDEX command and the pg\_repack extension to manage it. Haki suggests that partial indexes are particularly useful for fields with a high percentage of NULL values.
featured in #443
Create An Advanced Search Engine With PostgreSQL
tl;dr: “The Postgres approach to full-text search offers building blocks that you can combine to create your own search engine. This is quite flexible but it also means it generally feels lower-level compared to search engines like Elasticsearch, Typesense, or Mellisearch.” The article covers: (1) The tsvector and tsquery data types. (2) The match operator @@ to check if a tsquery matches a tsvector. (3) Functions to rank each match (ts\_rank, ts\_rank\_cd). (4) The GIN index type, an inverted index to efficiently query tsvector.
featured in #430
Vectors Are In The New JSON In PostgresQL
tl;dr: “This in itself is an interesting statement, given vectors are a well-studied mathematical structure, and JSON is a data interchange format. And yet in the world of data storage and retrieval, both of these data representations have become the lingua franca of their domains and are either essential, or soon-to-be-essential, ingredients in modern application development. And if current trends continue, vectors will be as crucial as JSON is for building applications.”
featured in #427
Understanding Database Indexes In PostgreSQL
tl;dr: “This article will help you organize your knowledge and remind you about good practices. SQL is a declarative language meaning it tells the database what we want to do but not how to achieve it. The database engine decides how to pull data. We can help the query planner by using indexes.”
featured in #416
Nine Ways To Shoot Yourself In The Foot With PostgreSQL
tl;dr: (1) Keep the default value for work\_mem. (2) Push application logic into Postgres functions and procedures. (3) Use lots of triggers. (4) Use NOTIFY heavily. And more.
featured in #409
Postgres: The Graph Database You Didn't Know You Had
tl;dr: Dylan shows us how we can store and query graph data structures in Postgres, something he did at his previous job to dynamically generate work instructions on a manufacturing line. “Based on parameters given, and rules defined on each edge, we could generate the correct document by traversing a graph stored entirely in Postgres.”
featured in #402