Logging — Best Practices

Logging — Best Practices

logs

Logs are supposed to be a consistent & reliable source of information which makes it easier for us to troubleshoot issues in our systems: sometimes the logs provide too much information (like printing large objects) and other times they do not provide enough data (e.g. a request log without the request parameters). In this post, I’ll try to go over some of the lessons learned we accumulated over the past few years.

I’ll use Javascript for the examples but the same idea can be applied, most of the time, to any other language.

Printing an Object

Here’s how to NOT print an object:

logger.info(organization);

The reason for that is that we’ll get objects printed in the log without context as to where it was printed from and why. Further, some objects may contain a lot of uninteresting information and we’ll be SPAMing the logs. And last, we can’t always control how an object evolves, maybe today it has only two attributes and both are interesting and may be helpful, but within a year from now, it may contain PII or other sensitive information (like a password, for example) that we do not want to get printed to the log.

So here’s how we should do it:

logger.info("updateOrganization finished committing all changes", {org: organization.id});

Bottom line:

  • provide some context
  • ideally, print only the object ID or only a specific subset of the object properties — if we want to figure out all the other properties of an object, we have a great tool for the job called DB.

Remember: Logs are for context, Databases are for Data!

Use Correct Log Levels

In my opinion, it’s sufficient to have: silly, debug, info, warn and error log levels.

Some engineers like to have additional levels like verbose, trace, alert, critical, fatal, and some creative developers even invent additional levels…

Please don’t do that. Less is more!

Let’s discuss the different levels:

ERROR - Log entries that indicate a system error that may be critical. For example, an HTTP call failed. A former colleague of mine wrote a nice piece that includes a section on logging exceptions which I recommend reading

WARN - Significant events indicate an error, but the behavior is either expected or not critical. This could be a failed charge where there’s a retry in place, for example.

INFO - Useful information related to the operation of the system. Our production systems are tuned to report this as the minimum level logged.

DEBUG - Intended to be used in pre-prod environments or locally. Use this log level when you don’t want to remove a log-print when merging (because it may prove useful), but also don’t want it to show up in production. There should be a flag/environment-variable that can turn on debug printing (even in production) but it should be used scarcely.

Logging HTTP Request/Response

In ecosystems that work synchronously (like JVM), we can print the id of the thread/servlet (or session-id) that handles the request at the beginning of every log-print. This way we can filter all the prints of the same HTTP request and easily see the execution path. I cannot stress enough how much this capability is important for operations!

The problem starts when we work in ecosystems that are async by nature.

Hello Nodejs, yes, I’m talking to you!

There are ways to do it (here’s an example that uses a middleware to store context on CLS), but I still haven’t seen one that is easy, straightforward, and bullet-proof…

What about logging the entire request/response?

My answer is: Don’t.

There are two good reasons why you shouldn’t log the entire request/response:

  1. It will clutter your logs
  2. The risk of logging PII and hence not being compliant with GDPR, CCPA, etc

That said, there are also two motives to save the requests somewhere:

  1. If/when needed — the requests can be replayed (that’s a weak excuse IMO but still valid)
  2. debugging: for example if we see that our server crashed when a request was sent to a specific route, being able to repro the scenario can save a lot of time!

So if you decided to log the entire request, try to figure out a policy where your logs TTL within a few weeks so that all this data will get erased. Further, I recommend that it will be done into a system that is separate from your logs: this way you can restrict access to it and keep your customer information safe!

Two more things that we can do to improve our requests logging:

  • print the time that it took to process the request
  • print the IP of the client that made the request

Logging a scheduled/batch job

Here it’s much easier to keep the context in the logs, for example, adding to all the prints the job-ID of the batch job that is running, will help us filter out anything that is not related. Probably needless to say but just in case: use UUIDs, do not use Integers as IDs.

Log the filename and line-number

Again, some ecosystems make it very easy to do so, but some of them…

Yes, Nodejs, I’m talking about you, again…

Not sure why I need to go through hula-hoops to do something so simple in other technologies.

Named Loggers

We can create different loggers inside different components and these loggers will print, by default, their name as well (in addition to a timestamp and a contextId/threadId). Doing this can provide many benefits, one of them is that it makes it easier for us to analyze the percentages of errors that are coming from the different components which we can use as a baseline when we test new versions before they are released.

Structured Logging

What is structured logging?

Imagine that the “lines” in your logs, instead of being plain-text, look something like this:

{
"address": {
"street": "1234 Cheatham Creek Court",
"city": "Cumming",
"state": "ga",
"zipcode": "30040",
"input_id": "r856o27l",
"zip": "30040"
},
"contextId": "iuoyiyert-34rf34r-fegrg",
"filename": "data-vendors/wiser_streets/index.js:109",
"level": "warn",
"message": "property-service failed to return info from wiser-streets!",
"timestamp": "2021-11-03T20:44:08.460Z"
}

Moving to structured logging has the following advantages

  • Creates standardization: having a single way, which works automagically, to log an object reduces cognitive load and allows new developers easier on-boarding
  • Creates consistency: having a consistent structure makes it easier to search the logs and creates clear expectations
  • Creates cleaner & refactored code: by using structured logs we’re applying SRP (single responsibility principle), now whenever we have a new important property added/renamed/removed — there will be one place where object log formats are defined and they will include as few fields as necessary to troubleshoot actions on that object
  • Helps Triaging Issues: due to the consistency & standard applied it becomes easier to run queries across different components and reduce MTTR
  • Enables Analytics: structured logs could be digested by big-data tools (Elasticsearch, Athena, Big-Query, etc) which can prove useful not only for vertical-searches (request context) but also horizontal-queries which can help generate an analysis of funnels, retention, etc.

There are open-source projects like Fluentd and Logstash (the “L” in ELK, built on top of Elasticsearch) which can be used to digest structured (and also unstructured) logs, giving you full control over your logs but also paying the price of having to maintain it.

There are also 3rd party providers that will collect your logs and will provide a nice UI tool that you’ll be able to use to search the logs, set alerts on specific prints (in configured frequencies, etc) such as logz.io, Sumo Logic, Loggly, and many others.

Last words

If you found this article interesting, I also recommend reading an excellent post written by my friend Osher, which discusses misconceptions about logging from infrastructure perspectives such as: using a centralized logging system and using structured logs.


Logging — Best Practices was originally published in Everything Full Stack on Medium, where people are continuing the conversation by highlighting and responding to this story.

Senior Software Engineer

Backend Group

Thank you for your interest!

We will contact you as soon as possible.

Want to Know More?

Oops, something went wrong
Please try again or contact us by email at info@tikalk.com
Thank you for your interest!

We will contact you as soon as possible.

Let's talk

Oops, something went wrong
Please try again or contact us by email at info@tikalk.com