
Modeling performance and identifying outliers with logging One approach is to use logging, something you probably want to do anyway to help with debugging and diagnostics. Whatever the cause, the first step to fixing the underlying problem is identifying the specific jobs that are outliers: the jobs that are running more slowly than expected. Perhaps because of environmental reasons, perhaps because different inputs give different behavior. This is where the situation get more complex.Īt this point most jobs are fast, but occasionally they’re slow. You fix the bottlenecks, measure again, and iterate until eventually you’ve created a sufficiently efficient baseline.

So to begin with, you can profile jobs at random, ideally in production, and use the profiling results to identify places where your code is too slow, or using too much memory. When you first start implementing these sort of long-running tasks, you can reasonably assume that your code is inefficient.

Life is easier when jobs finish successfully, customers are happy, and you have plenty of money left over in your budget. While these notification mechanisms do work, it’s probably best not to rely on them. Your cloud computing bill is twice what it was last month.

Customers start complaining about slow or failed jobs.Jobs start getting killed when they hit timeouts.

Here are some of the ways you can discover your data processing jobs are too slow:
