We have a Linux server (Redhat Enterprise 6) running a multi-site installation of Mediawiki. In total, we have today 120 unique wiki’s, each having its one Postgresql 9.1 database.
Suddenly the server stopped responding. In Apache webserver, the total number concurrent apache processes went up to Apache’s max-client setting in httpd.conf, and people couldn’t log in or see any wiki-pages.
In the apache log, we did not see anything special. But, when we started to check which processes were running during a full stop on the server with the Unix command
ps, we could see entries like:
# ps -ef | egrep 'DELETE|idle'
postgres 25106 2502 0 11:12 ? 00:00:00 postgres: ubwiki ubwiki 127.0.0.1(37186) idle in transaction
postgres 25108 2502 0 11:12 ? 00:00:00 postgres: ubwiki ubwiki 127.0.0.1(37189) DELETE waiting
postgres 25447 2502 0 11:13 ? 00:00:00 postgres: ubwiki ubwiki 127.0.0.1(37420) DELETE waiting
postgres 25619 2502 0 11:14 ? 00:00:00 postgres: ubwiki ubwiki 127.0.0.1(37553) DELETE waiting
A search on Internet found that also others experienced this, and the solution to our problem we found here:
We implemented a fix to our problem by adding a bash script to /etc/crontab, running each fifth minute.
The script will first check if there are more then three (3) processes with the description “idle in transaction”.
If so, then the pkill command will stop (“kill”) the oldest one.
if [ `/usr/bin/pgrep -f 'idle in transaction' | wc -l` -gt 2 ]
echo "Email message about that process was found and stopped." | mail to@someone
/usr/bin/test `/usr/bin/pgrep -f 'idle in transaction' | wc -l` -gt 2; pkill -o -f 'idle in transaction'
The script runs every fifth minute. Here is our entry in the /etc/crontab file.
*/5 * * * * root /usr/share/scripts/fix-transactions/fix-transactions.sh