Keeping Puppet Dashboard’s Database small

Just encountered that the MySQL database which is used for storing the Puppet reports for the Dashboard has grown very large. The regular backup dumps I’ve created had more than 2GB of size. So I thought it’s time to have a look if everything is going well and if I can get rid of some old records. No, I had no cron job so far for deleting old records as recommended by Puppet’s documentation. However this cron job is now in place and I’d like to share some experiences I’ve made while analysing the database and while deleting old records manually.

The Analysis
First I wanted to find out if it is really Dashboard’s database which is responsible for the huge size of the dumps. So I logged in as root to MySQL and had a look:

mysql> SELECT table_schema "Data Base Name", SUM( data_length + index_length) / 1024 / 1024 "Data Base Size in MB" FROM information_schema.TABLES GROUP BY table_schema;
+----------------------+----------------------+
| Data Base Name       | Data Base Size in MB |
+----------------------+----------------------+
| dashboard_production |        2878.34375000 | 
| information_schema   |           0.00390625 | 
| mysql                |           0.52497292 | 
| puppet               |          26.93750000 | 
+----------------------+----------------------+
4 rows in set (1.30 sec)

mysql>

Obviously no doubt that dashboard_production alone creates 2.8GB of data. Digging a little deeper showed the table which holds the most records:

mysql> SELECT TABLE_NAME, table_rows, data_length, index_length,  round(((data_length + index_length) / 1024 / 1024),2) "Size in MB" FROM information_schema.TABLES WHERE table_schema = "dashboard_production";
+------------------------------+------------+-------------+--------------+------------+
| TABLE_NAME                   | table_rows | data_length | index_length | Size in MB |
+------------------------------+------------+-------------+--------------+------------+
| metrics                      |    1010174 |    51986432 |     14172160 |      63.09 | 
| node_class_memberships       |          0 |       16384 |            0 |       0.02 | 
| node_classes                 |          0 |       16384 |            0 |       0.02 | 
| node_group_class_memberships |          0 |       16384 |            0 |       0.02 | 
| node_group_edges             |          0 |       16384 |            0 |       0.02 | 
| node_group_memberships       |          0 |       16384 |            0 |       0.02 | 
| node_groups                  |          0 |       16384 |            0 |       0.02 | 
| nodes                        |         39 |       16384 |            0 |       0.02 | 
| old_reports                  |          0 |       16384 |            0 |       0.02 | 
| parameters                   |          0 |       16384 |            0 |       0.02 | 
| report_logs                  |    1112049 |   330104832 |     15220736 |     329.33 | 
| reports                      |      20866 |     2637824 |      2048000 |       4.47 | 
| resource_events              |     882791 |   107610112 |     13123584 |     115.14 | 
| resource_statuses            |    7925550 |  2369781760 |    109723648 |    2364.64 | 
| schema_migrations            |         45 |       16384 |            0 |       0.02 | 
| timeline_events              |      44091 |     3686400 |            0 |       3.52 | 
+------------------------------+------------+-------------+--------------+------------+
16 rows in set (1.07 sec)

mysql>

Table resource_statuses held almost 8 Million records producing some 2.3GB of data. So I had a look at Puppet’s documentation which says to issue a rake command to drop old records:

rake RAILS_ENV=production reports:prune upto=1 unit=mon

This command will delete all records older than one month. So I gave it a try, but:

[root@phost puppet-dashboard]# rake RAILS_ENV=production reports:prune upto=1 unit=wk
(in /usr/share/puppet-dashboard)
Deleting reports before 2011-11-24 13:37 UTC...
Deleted 66 reports.
Deleting resource_statuses records before 2011-11-24 13:37 UTC...
rake aborted!
Mysql::Error: The total number of locks exceeds the lock table size: DELETE FROM `resource_statuses` WHERE (time < '2011-11-24 13:37:32') 

(See full trace by running task with --trace)
[root@phost puppet-dashboard]# 

Seemed, MySQL had not enough buffer to lock all records. So I increased the buffer pool of the InnoDB by adding the appropriate configuration to my.cnf and restarted MySQL:

innodb_buffer_pool_size=1024M

Those 1024M might be too much, I’m not quite sure. However, it worked. After some estimated 15 minutes the task was finished. But after checking the table sizes again, it turned out that resource_statuses still had the same size of 2.3GB. Some research in Puppet’s user list showed up some posts with this problem. One advice was to add a patch to file /usr/share/puppet-dashboard/lib/tasks/prune_reports.rake. That’s the file which is executed by the above rake command and does the actual deletion of the table records. That’s the patch:

--- prune_reports.rake	2011-09-08 12:42:54.433928840 -0700
+++ prune_reports.rake-new	2011-09-08 12:44:01.928583735 -0700
@@ -52,5 +52,11 @@
     puts "Deleting reports before #{cutoff}..."
     deleted_count = Report.delete_all(['time &lt; ?', cutoff])
     puts &quot;Deleted #{deleted_count} reports.&quot;
+
+    # inserted to clean up resource_statuses table simultaneously
+    puts &quot;Deleting resource_statuses records before #{cutoff}...&quot;
+    deleted_count = ResourceStatus.delete_all(['time &lt; ?', cutoff])
+    puts &quot;Deleted #{deleted_count} resource_status records.&quot;

Obviously only the table reports is deleted by this script, but this table only makes up some 4.5MB. So the solution of this patch is to add the table resource_statuses to this script.

Once again I triggered the rake command after applying the patch and after execution the table size really had decreased:

mysql> SELECT TABLE_NAME, table_rows, data_length, index_length,  round(((data_length + index_length) / 1024 / 1024),2) "Size in MB" FROM information_schema.TABLES WHERE table_schema = "dashboard_production";
+------------------------------+------------+-------------+--------------+------------+
| TABLE_NAME                   | table_rows | data_length | index_length | Size in MB |
+------------------------------+------------+-------------+--------------+------------+
...
| report_logs                  |     971812 |   331153408 |     15220736 |     330.33 | 
| reports                      |      21109 |     2637824 |      2031616 |       4.45 | 
| resource_events              |     899663 |   107610112 |     13123584 |     115.14 | 
| resource_statuses            |    3331699 |  1037041664 |     45694976 |    1032.58 | 
...

So the solution was found. Now I wanted to have everything managed by Puppet. First I created a bash script which would be executed by a cron job:

#!/bin/bash
DASHBOARD_DIR=/usr/share/puppet-dashboard

cd ${DASHBOARD_DIR}
rake RAILS_ENV=production reports:prune upto=1 unit=wk
rake RAILS_ENV=production db:raw:optimize

This script calls the rake command I executed manually before along with another command that optimizes the database. Though Puppet’s documentation recommends to optimize the database only once a month, I have this script is running once every night. And, as you can see, I now delete all records older than just one week which is perfectly OK for our system.

The Puppet configuration which is responsible for executing the cron job and having the patched file in place is this:

  $purge_file="purgeDashboardDatabase.sh"
  $prune_reports_file="/usr/share/puppet-dashboard/lib/tasks/prune_reports.rake"

  file{"${prune_reports_file}":
    ensure => file,
    mode => 0644,
    source => "puppet:///modules/mysql/${prune_reports_file}",
    replace => true,
    before => "${bin_dir}/${purge_file}",
  }
  
  file{"${bin_dir}/${purge_file}":
    ensure => file,
    mode => 0755,
    source => "puppet:///modules/mysql/${purge_file}",
    replace => true,
    require => File["${bin_dir}"],
  }
  
  cron { purge-dashboard-database:
    command => "${bin_dir}/${purge_file}",
    ensure => present,
    user => "root",
    hour => 2,
    minute => 0, 
    require => File["${bin_dir}/${purge_file}"],
  }

finally{}
Beware that this solution covers Puppet 2.6.6 along with its Ruby libraries. The rake script might be enhanced in newer versions. So you may want to check if this is necessary for you once you run into the same problem.

Credits
Thanks to Mkyong’s blog post who provided the SQL statements for analysing the database and tables. And of course thanks to Puppet’s user list which is always a very good resource if something does not work as expected.

, ,

No comments yet.

Leave a Reply

* Copy This Password *

* Type Or Paste Password Here *