In my previous article I have already given the details of SQL performance tuning in detail and guidelines to do it. In this article i would like to focus on PostgreSQL performance tuning and I will attach PDF notes of the same so that user will understand the details easily. The performance tuning is tedious task and complex tasks sometimes programmers love to do it if you have enough time for analysis or sometimes they will not like it. In this article I would like to cover all the details of PostgreSQL performance tuning pdf notes in depth.
What you will find in the article?
- How to find the Slow Running Queries?
- What are different query level checks you require to perform?
- What are different parameters impacting PostgreSQL query performance?
How to find Slow Running Queries in PostgreSQL?
The first question of programmers are how do we find the slowly performing queries in PostgreSQL. According to the customer requirements the methods may vary . For Our current customer they have configured AppDynamics and with using that you can find slowly performing queries easily. But if you are not using any tool like PostgreSQL Enterprise manager ( PEM ) or AppDynamics then you can use the query at database level .
SELECT
pid,
user,
pg_stat_activity.query_start,
now() - pg_stat_activity.query_start AS query_time,
query,
state,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '2 minutes';
The above query will easily find out the multiple queries which are running more than 2 mins interval in your postgress database level.
You can also use following statement to sort the queries ascending or descending order and check its time :
SELECT *
FROM
pg_stat_statements
ORDER BY
total_time DESC;
These are above two ways to find out the slowly running queries at PostgreSQL level.
What are different Query Level checks you require to perform?
In this section I would like to give the different query level checks stepwise you require to perform for postgresql tuning activity.
Step 1 : Use of proper indexing and collection of Stats
1.1. Eliminate the Sequential Scan
The first step is to check all the stats gathering activity is fine in PostgreSQL databases. The second important check is regarding the indexes and usage of proper indexes. If there is a sequential scan activity then you might need to reduce that sequential scan using the indexes on the table level. If you require to find out the sequential scan at database level and missing indexes on database level you can use following query .
SELECT
relname,
seq_scan - idx_scan AS too_much_seq,
CASE
WHEN
seq_scan - coalesce(idx_scan, 0) > 0
THEN
'Missing Index?'
ELSE
'OK'
END,
pg_relation_size(relname::regclass) AS rel_size, seq_scan, idx_scan
FROM
pg_stat_all_tables
WHERE
schemaname = 'public'
AND pg_relation_size(relname::regclass) > 80000
ORDER BY
too_much_seq DESC;
The above query will not give you exactly which columns needs to be indexed but it will give the view of which tables we require to do analysis and create the indexes to avoid sequential scan.
1.2.Remove unused indexes
There are so many unused indexes in your PostgreSQL environment. You require to drop all the unused indexes so that it will improve the read write operations drastically.
You can use following query to find out the unused indexes and drop it :
SELECT
indexrelid::regclass as index,
relid::regclass as table,
'DROP INDEX ' || indexrelid::regclass || ';' as drop_statement
FROM
pg_stat_user_indexes
JOIN
pg_index USING (indexrelid)
WHERE
idx_scan = 0
AND indisunique is false;
The above query will give you unused indexes and script to drop unused indexes also.
1.3. Composite index checks :
If you are using the composite indexes make sure that the order of composite indexes are proper.
1.4. Selective indexes :
You require to use the highly selective indexes for the most commonly used data. Use of the vast indexing will reduce the performance of the queries.
Step 2 : Query writing techniques
You should follow some query writing techniques to improve your query performance. There are so many times the query needs to restructure,
- You require to avoid like statements as it will sort the data again and again and reduce performance.
- You require to avoid function calls in where clause.
- You require to avoid large IN() statements and try to restructure queries.
- You require to avoid cross join in the queries and try to use more equality statements (Inner join) in ON clause.
- Convert subqueries in to multiple join statements.
- Avoid group by and distinct while you are using joins in query level.
- Replace Exists where you are using IN clause
- You can avoid correlated subqueries wherever possible as it always increases query cost.
- Use common table expressions like With clause to improve performance of the queries
- Try to replace looping with set operators like – Union all or intersection operators.
- Avoid order by, distinct , group by wherever possible.
What are different parameters impacting PostgreSQL query performance?
There are so many parameters which will impact the performance of PostgreSQL. The PostgreSQL database admin has to check the load and tune the parameters according to the load. Following are few parameters we require to check for performance tuning of PostgreSQL.
- Shared_Buffer Parameter : Postgresql uses shared_buffer to check how much dedicated memory the postgreSQL will be used for cache. The default value is very low and DBA needs to increase the shared_buffer value as per the load of the database. Large value for shared buffer will give really good performance. Query : SHOW shared_buffers; ( Need to set in GB according to your system)
- Write Ahead Log (WAL) buffer : The second important parameter is WAL buffer. It is default set to 16 MB but you require to set high if you need good performance.
- Effective Cache Size : It is important parameter which shows memory available for effective cache. It decides how effectively indexes is been used. If you set high value it is always beneficial.
- Work_mem : If you are using complex sorting then you should use Work_mem as high value as possible.
5.Maintainance_Work_Mem : This kind of memory is been used for maintainace tasks of PostgreSQL. You can set it to high value.
6.Synchronous_Commit : If your application is designed such that performance is more important than the reliability, then turn off synchronous_commit
7.checkpoint_timeout :The checkpoint_timeout parameter is used to set time between WAL checkpoints. Setting this too low decreases crash recovery time, as more data is written to disk, but it hurts performance too since every checkpoint ends up consuming valuable system resources.
These are some most important parameters which we require to check at the time of PostgreSQL performance tuning activity. I hope this article will be useful to you. If you like this article or if you have any issues with the same kindly comment in comments section.