PGTuner - Automatic PostgreSQL Tuning Tool

Cleanup development and legacy code
TODO: Add automatic tests to detect error
TODO: More refined documentation with icon symbol of importance
TODO: Rewrite application as Javascript to support global user (if necessary)

Generic: The codebase has been refactored with the decision of hosting dual-backend in the same repository. Multiple functions has been revised to reduce performance noise, and multiple optimizations has been added to the codebase
- APP: We has completed the transition from Python and Javascript, with the decision of hosting dual-backend in the same repository.
- APP: Adding the support to PostgreSQL 18
Input Changes
- PY_BKE: Now user has to self-supplied the configuration they wanted rather than naively use the default input. See `./pgtuner_cli.py` for usage
- CONF: Introduce the parameter `cpu_to_connection_scale_ratio` with default to 5 (range from [2.5, 10]) to support the variety of scaling the number of connections in the general optimization phase.
- CONF: Increase the parameter `memory_connection_to_dedicated_os_ratio` from 0.3 to 0.7
- CONF: Change the default parameter of `hash_mem_usage_level` is changed from -6 to -5. If hash_mem_multiplier is 2.0, the working memory usage per connection for one arbitrary operation is increased from 1.1196 to 1.1417
- CONF: Revert the upper bound parameter of `wal_segment_size` to 2 GiB instead of 128 MiB (scale from 3 to 7)
- CONF: Change the default parameter of `min_wal_size_ratio` from 0.03 to 0.025
- CONF: Change the default parameter of `mem_pool_tuning_ratio` from 0.6 to 0.45
- CONF: Change the default parameter of `max_wal_size_ratio` and `wal_keep_size_ratio` from 0.05 to 0.04
- CONF: Remove all `*_profile` parameters except the `workload_profile` parameter.
- CONF: The supported PostgreSQL version is using the integer instead of string (including the term 'latest').
- CONF: Remove parameters of the OS and DB Log disk
- CONF: Enumeration the backup tool from string to :enum:`PG_BACKUP_TOOL`
- CONF: The only supported operating system is 'linux', 'windows', 'macos', 'containerd', and 'PaaS'
- CONF: Remove the workload of SOLTP, LOG, DW in :enum:`PG_WORKLOAD`
Algorithm Changes
- PY_BKE: Disable SYSCTL tuning if the OS is not Linux
- GTUNE: Add the condition for `bgwriter_lru_maxpages` to be well-adapted with associated workload
- GTUNE: The parameter `archive_timeout` is from 15 minutes bumped to 30-45 minutes as many current servers don't use the log-shipping method anymore, but streaming and logical replication.
- GTUNE: Add the configurations of `join_collapse_limit`, `from_collapse_limit`, and `plan_cache_mode`.
- GTUNE: Rework the self-tuning of `parallel_tuple_cost` configuration
- GTUNE: Adjust the scale ratio `max_parallel_workers` (vcpu * 1.125 ->> vcpu * 1.25 + 1) and `max_parallel_workers_per_gather` (vcpu / 3 ->> vcpu / 2.5) to have more parallel workers
- STUNE: Reduce the result of `cpu_tuple_cost` and `parallel_tuple_cost`
- STUNE: Simplify (reduce) configuration value of `default_statistics_target`
- STUNE: Simplify (reduce) configuration value of `after_commit_delay` to be workload-based rather than disk-based
- STUNE: Reduce the upper bound of `vacuum_*_min_age` from 25% of associated `vacuum_*_max_age` to 15%
- STUNE: Stop enforce the option `opt_wal_buffers` to be higher than `PG_PROFILE_OPTMODE.NONE`
- STUNE: Reduce by half the alignment size of `min_wal_size`, `max_wal_size`, and `wal_keep_size` (The configuration would be better detailed).
- STUNE: The two parameters `statement_timeout` and `lock_timeout` is reduced significantly.
- STUNE: A typo in `backend_flush_after` has been corrected from 512 MiB to 512 KiB.
- STUNE: Re-calibrate the algorithm of WAL pre-allocation of zero-filled WAL file (ignore the parameter `wal_init_zero`).
- STUNE: Re-calibrate the background writer behaviour by optimizing the `bgwriter_delay` and `bgwriter_lru_maxpages`
- STUNE: The `checkpoint_flush_after` is now 512 KiB across all disk specification.
Others
- PY_BKE: The GC has been pushed more with better GC management.
- PY_WEB: The request's limit has been bumped from 15 reqs per window to 250 reqs (on DEV) and 180 (on PROD)
- PY_BKE: The initialization on `./src/__init__.py` is improved with better logic. This includes the change in the `./src/utils`, `./src/static` (removed), `./src/tuner/data`, ... modules
- PY_BKE: The general optimizer has been moved out-of-class to be functional
- PY_BKE: Add the method :func:`PG_TUNE_ITEM.__repr__()` and remove the argument `output_if_difference_only` in :func:`PG_TUNE_ITEM.out()`/
- PY_BKE: Enforce the warning when the database has less than 4 GiB of RAM
- CONF: Remove the scope of :enum:`PGTUNER_SCOPE.KERNEL_BOOT` optimize the boot settings for database
- CONF: Refactor the enum `PG_SIZING`

Generic
- Backend: Now input added the `frozen` attribute to indicate whether the parameter expect changes or not.
- DOC (UI & Backend): Most documentations from the backend and UI are better understanding, helping new developers and users understand what formula is calculated inside.
Input Changes
- The default of parameter `temp_buffers_ratio` is changed from 1/3 (0.33) to 1/4 (0.25)
- The upper bound of parameter `max_normal_memory_usage` is changed from 0.85 to 0.80
- The parameters `mem_pool_epsilon_to_rollback` and `mem_pool_tuning_increment` is removed from user input, and hard-coded as 0.0075 and 1/560 in the correction tuning phase.
- The default parameter of `mem_pool_tuning_ratio` is changed from 0.5 to 0.6
- The default parameter of `hash_mem_usage_level` is changed from -4 to -6 -> If the PostgreSQL configuration is 2.0, the working memory usage per connection for one arbitrary operation is reduced from 1.1713 to 1.1196)
- The default parameter `mem_pool_parallel_estimate` is changed from False to True to assume at any time, PostgreSQL can use parallel operation in general, thereby hopefully reduce the working memory per connection for one arbitrary operation by around `vCPU+3` unit of `work_mem * average_ratio`.
- The upper bound of `wal_segment_size` is reduced from 2 GiB to 128 MiB. The reason of change is added directly from the code `Increase this value is only beneficial when (1) you have a lot of incoming WRITE beyond 16 MiB per transaction, (2) high WAL file rotation time during workload (usually at old kernel and old PostgreSQL version), (3) high archive transfer due to small files (a lot of 16 MiB files) that translated into a mix of random and sequential IOPS, and (4) low number of allowed files in filesystem`
- Fix the underlying meaning of `min_wal_size`, `max_wal_size`, and `wal_keep_size` in our PostgreSQL understanding by introducing new algorithm that is based on the WAL volume capacity, ensuring checkpoint can be run in a timely manner, during burst workload, and maintained a reasonable number WAL records for streaming replication.
- Drop the parameter `max_wal_size_remain_upper_size`
- The lower bound of parameter `autovacuum_utilization_ratio` is changed from 0.50 to 0.30
- Move the parameter of `num_write_transaction_per_hour_on_workload` from advanced configuration to basic configuration.
- The default of parameter `num_write_transaction_per_hour_on_workload` is changed from 1M (1 million) to 50K (50 thousand) -> This is translated from 270 attempted WRITE transactions to 13.5 attempted WRITE transactions.
- Drop the parameter `repurpose_wal_buffers` as it makes zero contribution against small server, unless you having too little RAM and a low-end HDD on the WAL partition.
- Introduce the parameter `database_size_in_gib` in the basic configuration (default to 10 GiB and maximum at 32 TiB). This is used in the anti-wraparound tuning to be served as the minimum boundary hopefully the data volume can scan randomly at 30% WRITE IOPS on the full data files (not index files). If user don't know the amount of data they would have (for example new on-boarded application), then set to zero value meant a 60% of used volume in the data partition.
Algorithm Changes
- Add small warning if the server is not MINI and available RAM is lower than 4 GiB instead of hard-coded 2 GiB for any profile
- The parameter `autovacuum_naptime` is now 15 second for one worker and 30 seconds for each additional worker.
- The autovacuum parameter when INSERT (*_insert_threshold and *_insert_scale_factor) is now share same value as when normal autovacuum.
- Reduce the `max_wal_size` parameter on general tuning phase.
- The parameter `archive_timeout` is 15 minutes on large system and 30 to 1 hour on small system in general tuning phase
- The parameter `checkpoint_timeout` is 30 minutes on MINI profile instead of 15 minutes in general tuning phase
- The parameter `wal_keep_size` is default to 25 base WAL files (400 MiB as Azure)
- Introduce the parameter `max_slot_wal_keep_size` (default to -1)
- Un-necessary workloads are grouped: SEARCH / RAG / GEO --> VECTOR; TSR_HTAP --> HTAP; TSR_OLAP --> OLAP
- The parameter `default_statistics_target` has minor change.
- The parameter `checkpoint_flush_after` is backed to 256 KiB at general tuning phase, and bump to 512 KiB and 1 MiB if data volume is strong.
- Revise failsafe at anti-wraparound tuning
- Fix the underlying meaning of `min_wal_size`, `max_wal_size`, and `wal_keep_size` in our PostgreSQL understanding by introducing new algorithm that is based on the WAL volume capacity, ensuring checkpoint can be run in a timely manner, during burst workload, and maintained a reasonable number WAL records for streaming replication.
- The parameter `archive_timeout` is scaled by extra 10 minutes for one unit of larger WAL size (capped at 2 hour)
- The parameter `checkpoint_timeout` added the minimum time of finishing the checkpoint (at 70 % of mixed data IOPS) depending on the type of workload (workload scale is independent) in the correction tuning phase.
- The parameter `bgwriter_lru_maxpages` is increased when the disk performance is SSD or stronger.
- The four parameters `*_flush_after` is added into the correction tuning phase

UI & Backend: Remove the "os_reserved_memory" parameter
UI: Remove RAID configuration for disk parameters
UI & Backend: Add vacuum_safety_level parameter into the tuning guideline
UI & Backend: Switch the default disk performance from SSDv1 to SANv1
Backend: Update the formula for bgwriter and autovacuum

Cleanup development and legacy code.
VACUUM: Add vacuum_failsafe_age and vacuum_multixact_failsafe_age parameter into our tuning guideline. Push two '*_freeze_table_age' parameters into the correction tuning phase. These arguments require good estimation
MEMORY: Better performance on memory estimation phase with parallel estimation mode is applied in the correction phase
BGWRITER: Adjust the background writer parameters to match its use-case
VACUUM: Re-adjust the vacuum threshold and scale factor
Tune up some headers and meta tags for better SEO
Apply the HTML Jinja2 template for the web page
Extend the rate limit window for more requests

Cleanup development and legacy code
Better performance on correction tuning phase, especially on the memory pool increase tuning task, fasten from 6ms to 1-2 ms
Create robots.txt file for web crawler to index the web page
Move `_version` endpoint to the `_health` endpoint with a more dedicated health check; service uptime is reported
Refactor the rate-limit middleware: Merge the global rate-limit and the user (IP/Port) rate-limit into one middleware

Initial release of PGTuner
Supported for PostgreSQL 13 -> 17

PGTuner v0.1.5

Changelog

Changelog

v0.x.x (x) - Date: Unknown

v0.1.5 - Expected Date: May 12th 2025

v0.1.4 - Expected Date: March 01st 2025

v0.1.3 (Alpha) - Date: Feb 16th 2025

v0.1.2 (Alpha) - Date: Feb 14th, 2022

v0.1.1 (Alpha) - Date: Feb 09th, 2025

v0.1.0 (Alpha) - Date: Feb 01st, 2025