1,284,920,341
SecurityObservabilityBest Practices

Token in the Logs at 02:13: An Incident Retrospective on Log Exposure

Thomas NelsonMay 26, 20268 min read

At 02:13, we received a token in our logs that led to a 500 error across our systems. By 03:00, we had a SIEM redaction in place, but the token had already spread, even reaching a contractor’s laptop. We were left wondering who else might have seen it, but we didn’t have that info.

By then, downstream masking was too late.

Issues that contributed:

  • Log Proliferation: Too many log destinations increase exposure.
  • Unsafe Defaults: Frameworks often log sensitive data by default.
  • Fragmented Ownership: Fixes end up with SIEM across different teams.
  • Detection Challenges: Secrets are varied and hard to identify.
  • Pressure to Act: “Log everything” settings tend to stick around after crises.

Why conventional solutions fail:

  • Regex Failures: Often miss context and formats.
  • Delayed Masking: Raw data gets exposed quickly.
  • Blind Drops: Can lead to observability issues.
  • Encryption Limitations: Don’t prevent initial exposure.
  • Quick Fix Illusions: Rapid exposure complicates cleanup.

Preventive Measures:

  • Keep sensitive data minimal from the start.
  • Enforce strict logging policies.

For effective logs:

  • Use allowlists for specific event types.
  • Implement derived identifiers.

Policy enforcement:

  • Use logging SDKs that accept only approved fields.
  • Opt-in to production logging settings.
  • Exclude sensitive headers.

Checks to have in place:

  • Integrate CI testing.
  • Use runtime guards.

To reduce impact:

  • Utilize short-lived raw and sanitized data sinks.
  • Limit access.

Verification strategies:

  • Map out routes and track exposure times.
  • Use versioned policy-as-code.

Effective logging tips:

  • Exclude request bodies and auth headers.
  • Set up alerts for unknown fields.
  • Use safe exception details.
  • Default to sanitized logs.

Consider downstream masking only for third-party or legacy systems.

To improve:

  • Map critical paths and reduce logging.
  • Use minimal schemas with allowlists.
  • Run CI tests for confidentiality.
  • Set up agent discards.
  • Rotate access credentials regularly.

Tradeoffs:

  • Allowlists might slow things down; mitigate with good design.
  • On-path guards add some latency but boost security.
  • Focus on legacy paths with high risks.
  • Test downstream masking after deployment.

In conclusion, once data is logged, control is lost. Be proactive, validate with testing, and only use downstream masking as a last resort.