collecting bind dns stats with sensu
Recently I migrated off of AWS Route53 to my own BIND servers for a few of my domains. I didn’t do this because I think I can do DNS better than the folks at Amazon. Instead, I’m looking to collect some detailed statistics about DNS usage and running my own DNS servers was the path of least resistance to reach that goal.
Prepping the DNS servers⌗
My DNS setup is fairly typical, with a master and slave DNS server, each one located on different sides of the US. The master, ns1.stevem.io
, is in New York, NY while the slave, ns2.stevem.io
is in San Francisco, CA.
First thing to do is ensure statistics are enabled for BIND. In my /etc/bind/named.conf.options
file on BOTH of my DNS servers I have:
options {
...
// enable statistics
statistics-file "named.stats";
zone-statistics yes;
...
}
statistics-channels {
inet 0.0.0.0 port 8080 allow { MY.EXTERNAL.IP.1; MY.EXTERNAL.IP.2; };
};
What this does is write a statistics file to /var/cache/bind/named.stats
as well as enable the statistics HTTP endpoint on port 8080
from two external IP addresses. In this case I’m having it bind to 0.0.0.0
so I can reach the stats page by hitting either one of the public IPs assigned to the DNS server.
After making this change I simply issued a systemctl restart bind9
and things were working as intended. Visiting http://ns1.stevem.io:8080/
in my web browser presented a formatted page of statistics, while using curl to grab that URL gave me raw XML. It’s the XML that my script parses to build out the metrics.
Sensu Metric script⌗
Now that our BIND servers are configured for statistic output on port 8080 we just need a script we can have a Sensu Client execute to begin gathering the metrics. I created a script named metric-bind9-xml.rb
with the following contents:
#! /usr/bin/env ruby
require 'rubygems' if RUBY_VERSION < '1.9.0'
require 'sensu-plugin/metric/cli'
require 'nokogiri'
require 'open-uri'
require 'socket'
require 'uri'
class MetricBind < Sensu::Plugin::Metric::CLI::Graphite
option :url, short: '-u URL'
option :scheme, short: '-s SCHEME', default: "#{Socket.gethostname}.bind"
def run
if !config[:url]
# #YELLOW
unknown 'No URL specified'
end
acquire_resource
end
def acquire_resource
doc = Nokogiri::XML(open(config[:url]))
uri = URI(config[:url])
if uri.host
config[:scheme] = "#{uri.host}.bind"
end
results = {}
# store the qtype counters (query types)
doc.xpath('//server//counters[@type="qtype"]//counter').each do |row|
results.store("counter.qtype.#{row.attr('name')}", row.text)
end
# store the nsstat counters
doc.xpath('//server//counters[@type="nsstat"]//counter').each do |row|
results.store("counter.nsstat.#{row.attr('name')}", row.text)
end
# loop through zones
doc.xpath('//views//view[@name="_default"]//zones//zone').each do |zone|
zone_name = zone.attr('name').gsub(/\./, '_')
zone.xpath('counters//counter').each do |counter|
results.store("zone.#{zone_name}.#{counter.attr('name')}", counter.text)
end
end
results.each do |k,v|
output "#{config[:scheme]}.#{k}", v
end
ok
end
end
This is a pretty simple script that uses the nokogiri
gem to parse an XML document and grab relevant data for metric usage. It will grab the general stats, as well as stats for each zone, and return Graphite-formatted metrics results.
The Sensu Check definition looks like this:
{
"checks": {
"ns1_stevem_io_bind_metrics": {
"type": "metric",
"interval": 60,
"command": "\/opt\/sensu-plugins\/plugins\/dns\/metrics-bind9-xml.rb -u http:\/\/ns1.stevem.io:8080\/",
"handlers": [
"graphite"
],
"subscribers": [
"netmon"
]
},
"ns2_stevem_io_bind_metrics": {
"type": "metric",
"interval": 60,
"command": "\/opt\/sensu-plugins\/plugins\/dns\/metrics-bind9-xml.rb -u http:\/\/ns2.stevem.io:8080\/",
"handlers": [
"graphite"
],
"subscribers": [
"netmon"
]
}
}
}
Again, pretty simple. Every 60 seconds the script is run with a single parameter, -u http://YOUR.NAME.SERVER.HERE:8080/
passed in. The result is handled by a graphite
handler, and is executed by clients subscribing to netmon
.
Sample output from a check run looks like this:
ns1.stevem.io.bind.zone.stevem_io.SitMatch 0 1484282548
ns1.stevem.io.bind.zone.stevem_io.A 4033 1484282548
ns1.stevem.io.bind.zone.stevem_io.NS 18 1484282548
ns1.stevem.io.bind.zone.stevem_io.SOA 38 1484282548
ns1.stevem.io.bind.zone.stevem_io.MX 1 1484282548
ns1.stevem.io.bind.zone.stevem_io.TXT 1 1484282548
ns1.stevem.io.bind.zone.stevem_io.AAAA 2507 1484282548
ns1.stevem.io.bind.zone.stevem_io.SRV 16 1484282548
ns1.stevem.io.bind.zone.stevem_io.A6 2 1484282548
ns1.stevem.io.bind.zone.stevem_io.ANY 3 1484282548
Visualizing with Grafana⌗
Assuming you’re feeding these metrics into a Time Series Database such as Graphite, simply build out a query in Grafana to present the data in the way you determine works best. Personally, I have a couple graphs for the general stats, then one for each zone. I use stacked lines as the total number of queries I care about is a sum of A, AAAA, and CNAME requests.
With queries looking something like this:
aliasByNode(nonNegativeDerivative(ns*.stevem.io.bind.counter.nsstat.{Requestv4,QryNXDOMAIN,QryFailure,QryAuthAns,QrySuccess}), 0, 6)
Pretty simple! Ask questions if you have ’em.