monitoring – Row Coding

Context Deadline Exceeded – prometheus

August 25, 2023 by Tarik

Probably the default scrape_timeout value is too short for you [ scrape_timeout: <duration> | default = 10s ] Set a bigger value for scrape_timeout. scrape_configs: – job_name: ‘prometheus’ scrape_interval: 5m scrape_timeout: 1m Take a look here https://github.com/prometheus/prometheus/issues/1438

Increasing Prometheus storage retention

August 17, 2023 by Tarik

Edit the prometheus.service file vi /etc/systemd/system/prometheus.service add “–storage.tsdb.retention.time=1y” below to “ExecStart=/usr/local/bin/prometheus \” line. So the config will look like bellow for 1 year of data retention. [Unit] Description=Prometheus Wants=network-online.target After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/bin/prometheus \ –config.file /etc/prometheus/prometheus.yml \ –storage.tsdb.path /var/lib/prometheus/ \ –web.console.templates=/etc/prometheus/consoles \ –web.console.libraries=/etc/prometheus/console_libraries \ –web.external-url=http://34.89.26.156:9090 \ –storage.tsdb.retention.time=1y [Install] WantedBy=multi-user.target

Monitor the Graphics card usage [closed]

August 14, 2023 by Tarik

If you develop in Visual Studio 2013 and 2015 versions, you can use their GPU Usage tool: GPU Usage Tool in Visual Studio (video) https://www.youtube.com/watch?v=Gjc5bPXGkTE GPU Usage Visual Studio 2015 https://msdn.microsoft.com/en-us/library/mt126195.aspx GPU Usage tool in Visual Studio 2013 Update 4 CTP1 (blog) http://blogs.msdn.com/b/vcblog/archive/2014/09/05/gpu-usage-tool-in-visual-studio-2013-update-4-ctp1.aspx GPU Usage for DirectX in Visual Studio (blog) http://blogs.msdn.com/b/ianhu/archive/2014/12/16/gpu-usage-for-directx-in-visual-studio.aspx Screenshot from MSDN: … Read more

How can I ‘join’ two metrics in a Prometheus query?

July 25, 2023 by Tarik

You can use the argument list of group_left to include extra labels from the right operand (parentheses and indents for clarity): ( max(consul_health_service_status{status=”critical”}) by (service_name,status,node) == 1 ) + on(service_name,node) group_left(env) ( 0 * consul_service_tags ) The important part here is the operation + on(service_name,node) group_left(env): the + is “abused” as a join operator (fine … Read more

How to monitor a text file in realtime [closed]

February 24, 2023 by Tarik

I like tools that will perform more than one task, Notepad++ is a great notepad replacement and has a Document Monitor plugin (installs with standard msi) that works great. It also is portable so you can have it on a thumb drive for use anywhere. For a command line option, PowerShell (which is really a … Read more

How to measure solaris process memory usage?

November 22, 2022 by Tarik

prstat -s rss ‘-s’ sorts prstat output by rss column (see man page for other columns). Also try ‘-a’ option for a per user accumulation. ps -eo pid,pmem,vsz,rss,comm | sort -rnk2 | head Top 10 RAM consumers. ‘-o pmem’ displays percentage of resident memory i.e. RAM used by process. ls -lh /proc/{pid}/as Easy way to … Read more

How do I monitor multiple screens on one computer? (Say in a classroom?) [closed]

November 22, 2022 by Tarik

iTALC does just what you are looking for. As a bonus, it has presentation mode, and remote control ability. It is a front-end to (secure!) VNC servers running on each workstation. There are windows and linux builds available. Good luck!

Uptime Monitoring Every Second – Bad For the Server?

November 21, 2022 by Tarik

Can “any” server handle it? Probably. Should you do it? Probably not. Ask yourself a few questions: How fast will you be to respond to an outage? How many pageviews do you normally receive per second? How many consecutive errors are you willing to see before calling it “Down” and sending an alert? Do you … Read more

Geographically distributed, fault-tolerant and “intelligent” application/host monitoring systems

November 21, 2022 by Tarik

not an answer really, but some pointers: definitivly take a look at presentation about nagios @ goldman sachs. they faced problems you mention – redundancy, scalability: thousands of hosts, also automated configuration generation. i had redundant nagios setup but at much smaller scale – 80 servers, ~1k services in total. one dedicated master server, one … Read more

Common WQL Monitoring Queries

November 21, 2022 by Tarik

This is a truly great question, and it’s a shame it has not gotten more love! My basic theory of bottleneck analysis is to treat the system as a box with 4 sorts of finite resources: processor, memory, disk, and network. So I want to get basic numbers for each of these to determine the … Read more