Detecting a rogue Nodejs function in production

aesher9o1
3 min readFeb 25, 2022

--

Node process with 100% CPU usage

Unlike a node process that uses 100% MEM, where the node generally breaks due to heap exhaustion failure. When the CPU is at 100%, the thread is blocked, and the node is unable to accept new connections. This is frequently caused by an infinite loop in your code, which are notoriously difficult to locate. Worse, your single-threaded node process may not restart, causing that server to just sit there devouring your resources.

In my opinion, you may ask the Kubernetes scheduler to terminate such pods if you do proper liveness checks. But today’s not the time to choose that path.

Let’s set the mood with a dummy rogue code.

function one() {
two()
}
function two() {
three()
}
function three() {
four()
}
function four() {
rogue()
}
function rogue() {
const str = 'username:kenp username:dillonf'
let match
while ((match = /username:(\w+)/g.exec(str)) !== null){
console.log(match)
}
}
one()

The above code simulates a production-like environment in which one function calls the next, which calls another, and somewhere down the ladder lies a function with an edge case that you may not be aware of. YET

If you execute the code, you may see what is known as an unending print statement written in this manner.

... ["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"] ["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"] ["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"] ["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"] ["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"] ["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"] ["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"] ["username:kenp", "kenp", index: 0, input: "username:kenp username:dillonf"] ...

Your system is unstable now. Right off the bat what you might want to see would be the processes running on the host. Here’s our first command.

> topPID    COMMAND      %CPU  TIME     #TH    #WQ  #PORT MEM    PURG53763  Terminal     111.9 02:34.96 9/1    3    328-  4967M+ 8992K-53913  node         53.1  01:08.65 7      0    29    219M   0B

The top command displays a dynamic real-time picture of the system in use. This command often displays the system’s summary information as well as a list of processes or threads that are presently controlled by the Linux Kernel.

My mac masks the CPU used by the node as the CPU used by the terminal. Oh my god! there is 111% CPU usage!

Step 1: Enter debugging mode

kill -SIGUSR1 53913

Replace 533913 with the PID you got with top the command.

Don’t worry this will not kill the node process. This only signals node to start listening for debugging messages.

Step 2: Inspect the code

node inspect -p 53913

This will open a debugger for you in your command line.

If you are on your local computer and want to spawn a whole V8 debugger on chrome you can instead use

node inspect -p 53913

But if you are like me and do not want to go through a hierarchy to allow a firewall rule that might allow port forwarding and other networking messes. You will be better off with the first command.

Your terminal should now look like this

debug>

Step 3: See where the code is paused

Use the command pause to see where the code is paused at

debug> pause
break in node:internal/console/constructor:359
357
358 const consoleMethods = {
>359 log(...args) {
360 this[kWriteToConsole](kUseStdout, this[kFormatForStdout](args));
361 },
debug>

And voila. Node tells you it’s stuck at a console.log statement.

This might not look too helpful in this example. But to be honest, your production would be more than a console log.

--

--

aesher9o1

Sometimes it is the people no one can imagine anything of, do the things no one can imagine.