Careful with native NodeJS fetch() in serverless setups (with any API, not just the Marketo API)

For years, we’ve leaned heavily and happily on AWS Lambda to stretch Marketo into other realms.

A simple Lambda lets you integrate with, say, event platforms, transforming their generic outbound webhooks to valid inbound Marketo API calls. Especially cool are “loopback integrations”: a Marketo webhook calls a Lambda that swoops back in to call the Marketo API, allowing juiced-up flows of all sorts.

Serverless architectures pioneered by AWS are perfect when a service doesn’t need to run hot 24/7 but must still be available and scalable 24/7, i.e. when a second of spin-up time after idling or increased load isn’t a problem.[1]

But the dynamic spin-down/spin-up comes with undocumented risks. One rears its head with newer (but not older!) versions of NodeJS Lambdas. To make matters worse, you won’t see the problem when developing your Lambda offline, only when it’s really running within AWS.[2]

A simple example

A snippet from a Lambda that receives an inbound payload, transforms it into a proper Push Lead payload, gets an OAuth token, and sends on to Marketo:

let outboundJSON = transformCventToMarketo(inboundJSON);
fetch( marketoIdentityURL )
.then( extractAccessToken )
.then( accessToken => 
  fetch( 
    marketoPushLeadURL, 
    { 
      method: "POST",
      headers: { 
        ...accessToken, 
        ...contentTypeJSON 
      },
      body: outboundJSON 
    })
    .then( successOrRetry )
)
.catch( failure )

To be clear, there’s nothing wrong with this logic. (It implies the access token isn’t cached, but that’s tricky to do in Lambda anyway.[3])

Yet when you deploy to AWS and start putting the code through its paces, you’ll sporadically see a fatal error:

"TypeError: fetch failed"

If you switch back to a version of NodeJS that doesn’t have native fetch and instead use a polyfill like node-fetch, you won’t get the errors!

What’s the deal?

Weirdly, nobody really knows.

We know NodeJS’s native fetch is a compiled-in version of the renowned Undici module, maintained by some of the greatest HTTP developers in the world. Undici has been part of NodeJS since v18 and is continually upgraded.

And we know there aren’t reports of this problem with regular NodeJS (i.e. on your own server/container), only with NodeJS on Lambda. A very diligent user wrote up his findings in April 2024, but so far there isn’t any closure. And we can attest that it still happens with the latest NodeJS v22 on Lambda.

The direct cause seems to be Undici’s keep-alive connection pool being clobbered by an external “resource governor” — whatever proprietary controls AWS has in place to ensure Lambdas don’t exceed memory or execution time limits. The rest is a black box.[4]

The workaround

Using node-fetch will work, but seems like a step backward. We’ve instead told Undici not to keep connections alive, like so:

let outboundJSON = transformCventToMarketo(inboundJSON);
fetch( marketoIdentityURL, { 
  headers: { 
    Connection: "close" 
  } 
})
.then( extractAccessToken )
.then( accessToken => 
  fetch( 
    marketoPushLeadURL, 
    { 
      method: "POST",
      headers: { 
        Connection: "close", 
        ...accessToken, 
        ...contentTypeJSON 
      },
      body: outboundJSON 
    })
    .then( successOrRetry )
)
.catch( failure )

This solved the problem for our high-traffic Lambdas.

Notes

[1] In the event platform example, if there aren’t any registrations for awhile, the running Lambda environments spin down to save resources. They automatically spin back up when a registrant comes in, but that first request might take an extra second. The delay would never be felt by an end user, since it’s a backend-to-backend connection.

[2] Kinda like when you write working Velocity code offline, then paste it into a token and go Wha??? The deployed version always holds some surprises.

[3] Yes, you can cache data in the top level scope, but as more containers spin up each one ends up with its own cache.

[4] We’re not onboard with developer mcollina’s suggestion that it’s due to the entire NodeJS process being suspended/spun down; the error happens with a Lambda instance that stays running across invocations. But we’re definitely convinced some Lambda special sauce is responsible, not NodeJS.