Matt Self's Blog Matt Self's Blog

Measure All The Things (Day 2)

Day 2 - Learning by doing

Why is sending, storing, and using time-series data awesome? Its all about feedback.

Before we start working on the time-series system, we'll go on a slight tangent to discuss why capturing metrics about your applications is important.

The speed at which we can receive feedback is a direct indicator of how quickly we can move through the software development cycle/workflow. The shorter the feedback loop the quicker we'll be able to react, and the quicker (and more) we'll be able to push change and innovate. We want to transform our organizations into fast moving and high performing environments... and in order to do so we must create an software factory that shortens the cycles to push change and receive feedback in order to gain a competitive advantage. Without short feedback loops we are rate limiting our ability to change.

What should be measured? Everything.

Application, System, Network measurements.

Application Metrics

Any function that provides value or insight into an application should be timed or counted and sent.

System

Understanding the health of your servers is of paramount importance. The standard set of information to capture is Memory, CPU, and disk. See CollectD.

Network

Having visibility into the health of the network is also important when troubleshooting. To begin with, using tools that live with the System/Machine to get a view into the network health (as the machine perceives its own network interface) is a good start. Ultimately, using correlation identifiers to act as tracer "bullets" that can capture points in time as requests move throughout our networked systems. See Zipkin, Dapper, CollectD, Ping Script

Error counts

Another valuable data point is your Error logs. Error logs can be parsed and counted, making it possible to monitor for fluctuations in Error rates and trend on specific Error strings. You can send these to Graphite, but this is also a great place to use Logstash, Elastic Search, and Kibana.

Capture Change Events (track every release)

Changes to our application code are opportunities for the introduction of issues. By tracking change events (deploys) we can correlate metric fluctuations to causation from the change event. Application Metrics + Change Events are keys to providing confidence and enabling change (i.e Continuous Deployments).

Amplify the feedback

Display a wallboard for goodness sakes. Use a large display that shows always relevant, easy to understand, graphs. Atlassian explains it well here and here.

Wallboard

Monitoring (Application, System, Network metrics)

Use the Humans:

Amplify the feedback (graphs) by using a wallboard. Humans watch the graphs for fluctuations during deploys and are happy.

Use the Machines:

In order to use the machines for monitoring we must talk about normal trends. What is abnormal? For monitoring, it is important to look at normal and abnormal. Most monitoring is done with static thresholds for alarms and alerts, but in our applications thresholds change over time and at different times of the day. Our systems might see a huge variance in response, but we would likely receive no alerts if we have set a high static warning threshold. Large spikes during business peaks can be normal and the troughs can be just as extreme during lows, so thresholds need to be dynamic. One way we can better monitor and alert is using something like the Holt-Winters exponential smoothing algorithm for predicting confidences (upper and lower bounds) and plotting aberrations.

Summary

Whether you use Graphite, what we -eventually- build, or something else entirely, you should definitely be capturing and monitoring your application and system metric data. Now that you know why it is important, I hope you'll be -at least a bit more- excited to build a time-series system.

Next, we'll do a little bit of architecture/design.

Learning Go by Building a Time-Series System (Intro - Day 1)

Day 1 - Learning by doing

Let’s build something in Go together

I’ve been programming Go for a little while now, and have been plugging along, but I haven’t built anything truly useful or challenging. So, in an effort to “learn by doing”, I’ll be attempting to build something non-trivial and along the way I’ll share all of the gory details from soup to nuts.

Instead of building yet another blog engine or a to-do list application I’ve decided to build a time-series data store, services, and graphing engine. Much of what we build will be similar to functionality in -the oh so awesome- Graphite (and statsd). While I don’t think we’ll build all of the parts required, we will attempt to write the software ourselves in places where learning opportunity and low difficulty intersect. We’ll leverage and collaborate with open source software when the task is too great or doesn’t contribute to learning Go.

The high level plan will look like this:

  • We'll start with some context (i.e. why measure all the things) about why a time-series system is cool to build.
  • Then we'll do some design and architecture.
  • Set up our programming environment and level set on Go.
  • Start coding...
Along the way I'll take a number of tangents, where it is something I think is worth learning or explaining. When I do so, I'll try to designate those sections differently.

The High Level Functional Requirements

Store numeric time series data

  • UDP for high frequency, finely grained data points, and clients that need to fire and forget
  • HTTP for aggregated data
  • Low storage footprint
  • High precision for more recent data, lower precision as data ages

Retrieve numeric time series data

  • HTTP
  • Server side data functions support

Graphs

  • Simple, good looking, plotted graphs
  • Doesn’t need to be interactive

Additionally, we want:

  • Testable software
  • Decent test coverage
  • Documentation
  • Fun (if it isn’t fun to build it, we won’t)
  • Learning (we’ll take some detours along the way when appropriate)
  • A design/architecture that supports change in key areas

Following along at home

I’ll be using git / github to manage the changes over time. I’ll be providing commit ids along the way for the folks that would like to follow along.

Summary

I'm going to build a time-series system and while I do so I will write down all of the ugly details along the way. You can follow along and we'll learn something together.

Learning Go by Building a Time-Series System (Intro - Day 1)

Day 1 - Learning by doing

Let’s build something in Go together

I’ve been programming Go for a little while now, and have been plugging along, but I haven’t built anything truly useful or challenging. So, in an effort to “learn by doing”, I’ll be attempting to build something non-trivial and along the way I’ll share all of the gory details from soup to nuts.

Instead of building yet another blog engine or a to-do list application I’ve decided to build a time-series data store, services, and graphing engine. Much of what we build will be similar to functionality in -the oh so awesome- Graphite (and statsd). While I don’t think we’ll build all of the parts required, we will attempt to write the software ourselves in places where learning opportunity and low difficulty intersect. We’ll leverage and collaborate with open source software when the task is too great or doesn’t contribute to learning Go.

The high level plan will look like this:

  • We'll start with some context (i.e. why measure all the things) about why a time-series system is cool to build.
  • Then we'll do some design and architecture.
  • Set up our programming environment and level set on Go.
  • Start coding...
Along the way I'll take a number of tangents where it is something I think is worth learning. When I do so, I'll try to designate those sections as a side note.

The High Level Functional Requirements

Store numeric time series data

  • UDP for high frequency, finely grained data points, and clients that need to fire and forget
  • HTTP for aggregated data
  • Low storage footprint
  • High precision for more recent data, lower precision as data ages

Retrieve numeric time series data

  • HTTP
  • Server side data functions support

Graphs

  • Simple, good looking, plotted graphs
  • Doesn’t need to be interactive

Additionally, we want:

  • Testable software
  • Decent test coverage
  • Documentation
  • Fun (if it isn’t fun to build it, we won’t)
  • Learning (we’ll take some detours along the way when appropriate)
  • A design/architecture that supports change in key areas

Following along at home

I’ll be using git / github to manage the changes over time. I’ll be providing commit ids along the way for the folks that would like to follow along.

Summary

I'm going to build a time-series system and while I do so I will write down all of the ugly details along the way. You can follow along and we'll learn something together.

Favorite Software Engineering / Architecture / Tech Books

  • Domain Driven Designs by Eric Evans
  • Object-Oriented Analysis and Design with Applications by Grady Booch
  • Gödel, Escher, Bach by Douglas Hofstadter
  • Design of Design by Fred Brooks
  • The Pragmatic Programmer
  • Coders at Work by Peter Seibel
  • JavaScript - The Good Parts
  • Scaling Lean and Agile by Craig Larman
  • Applying UML and Patterns by Craig Larman
  • Agile Software Development: Principles, Patterns, and Practices by Robert C. Martin
  • Code Complete by Steve McConnel
  • Object Design by Rebecca Wirfs-Brock and Alan McKean
  • Domain Driven Design by Eric Evans
  • High Performance Browser Networking by Ilya Grigorik
  • Thinking in Systems by Donella H. Meadows
  • Made to Stick by Chip and Dan Heath
  • Are Your Lights On? by Gause and Weinberg
  • CODE by Charles Petzold
  • Hackers and Painters: Big Ideas from the Computer Age
  • Refactoring by Martin Fowler
  • The Myths of Innovation by Scott Berkun