Increasing Prometheus storage retention

Edit the prometheus.service file vi /etc/systemd/system/prometheus.service add “–storage.tsdb.retention.time=1y” below to “ExecStart=/usr/local/bin/prometheus \” line. So the config will look like bellow for 1 year of data retention. [Unit] Description=Prometheus Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus \ –config.file /etc/prometheus/prometheus.yml \ –storage.tsdb.path /var/lib/prometheus/ \ –web.console.templates=/etc/prometheus/consoles \ –web.console.libraries=/etc/prometheus/console_libraries \ –web.external-url=http://34.89.26.156:9090 \ –storage.tsdb.retention.time=1y [Install] WantedBy=multi-user.target

Monitor the Graphics card usage [closed]

If you develop in Visual Studio 2013 and 2015 versions, you can use their GPU Usage tool: GPU Usage Tool in Visual Studio (video) https://www.youtube.com/watch?v=Gjc5bPXGkTE GPU Usage Visual Studio 2015 https://msdn.microsoft.com/en-us/library/mt126195.aspx GPU Usage tool in Visual Studio 2013 Update 4 CTP1 (blog) http://blogs.msdn.com/b/vcblog/archive/2014/09/05/gpu-usage-tool-in-visual-studio-2013-update-4-ctp1.aspx GPU Usage for DirectX in Visual Studio (blog) http://blogs.msdn.com/b/ianhu/archive/2014/12/16/gpu-usage-for-directx-in-visual-studio.aspx Screenshot from MSDN: … Read more

How can I ‘join’ two metrics in a Prometheus query?

You can use the argument list of group_left to include extra labels from the right operand (parentheses and indents for clarity): ( max(consul_health_service_status{status=”critical”}) by (service_name,status,node) == 1 ) + on(service_name,node) group_left(env) ( 0 * consul_service_tags ) The important part here is the operation + on(service_name,node) group_left(env): the + is “abused” as a join operator (fine … Read more

Geographically distributed, fault-tolerant and “intelligent” application/host monitoring systems

not an answer really, but some pointers: definitivly take a look at presentation about nagios @ goldman sachs. they faced problems you mention – redundancy, scalability: thousands of hosts, also automated configuration generation. i had redundant nagios setup but at much smaller scale – 80 servers, ~1k services in total. one dedicated master server, one … Read more

Common WQL Monitoring Queries

This is a truly great question, and it’s a shame it has not gotten more love! My basic theory of bottleneck analysis is to treat the system as a box with 4 sorts of finite resources: processor, memory, disk, and network. So I want to get basic numbers for each of these to determine the … Read more