Node.js 大量并发 Fetch 请求超时问题深度解析

2025-03-03 11:45:14

大量并发 Fetch 请求和超时延迟问题解析

用 Node.js 执行大量并发 fetch 请求时, 你可能会遇到超时时间远超预期的情况。比如, 设置了 2000ms 超时, 实际运行时间却超过 5000ms。这篇博客帮你分析原因和解决办法。

问题复现

下面这段代码模拟了这个问题：

const run = async (i) => {
  const start = Date.now();
  const abortController = new AbortController();
  const timeout = setTimeout(() => {
    abortController.abort();
  }, 2000);
  try {
    const v = await (
      await fetch("https://slowendpoint.com", { // 假设这是一个慢速响应的端点
        method: "POST",
        body: JSON.stringify([{}]),
        signal: abortController.signal,
      })
    ).text();
  } catch (e) {
    console.error(e);
  } finally {
    console.log(i, "runtime", Date.now() - start);
    clearTimeout(timeout);
  }
};
for (let i = 0; i < 10000; i++) {
  run(i);
}

如果 slowendpoint.com 响应较慢，这段代码的实际执行时间会远大于 2000ms * 并发数。为什么？

原因分析

延迟远超预期的原因有很多, 通常不是单一因素造成的，可能是多种因素的组合：

事件循环 (Event Loop) 的处理能力 : 尽管 Node.js 的事件循环非常高效, 但在高并发场景下, 大量的 I/O 操作 (如网络请求) 仍然可能导致事件循环的处理压力增大，影响请求的处理速度。单次循环处理的耗时增加会累积下来。
请求队列 : 浏览器和 Node.js 对同域名下的并发连接数有限制。 Fetch API 可能会对请求进行排队。超出的请求必须等待前面的请求完成后才能发送。
操作系统限制 : 操作系统对可以同时打开的网络连接数有限制。达到限制后，新的连接请求可能会被延迟或拒绝。
DNS 解析 : 如果 slowendpoint.com 需要进行 DNS 解析，首次解析可能会耗费一定时间。
服务器处理能力 : slowendpoint.com 服务器的处理能力有限，大量请求可能导致服务器响应变慢。
TLS 握手 : 建立HTTPS 连接时的 TLS/SSL 握手过程可能比较耗时,尤其是在高并发场景下，与服务器协商加密算法，验证证书,交换密钥等步骤会成为瓶颈.
网络状况 ：网络拥塞、丢包等问题会影响请求的响应时间。

解决方案

下面给出一些解决或缓解问题的办法:

控制并发数 ：

原理： 避免一次性发起过多请求，降低事件循环压力, 减少请求排队。
实现： 使用 p-limit 或类似库限制并发数。

import pLimit from 'p-limit';

const limit = pLimit(100); // 限制并发数为 100

const run = async (i) => {
  const start = Date.now();
  const abortController = new AbortController();
  const timeout = setTimeout(() => {
    abortController.abort();
  }, 2000);
  try {
    const v = await (
      await fetch("https://slowendpoint.com", {
        method: "POST",
        body: JSON.stringify([{}]),
        signal: abortController.signal,
      })
    ).text();
  } catch (e) {
    console.error(e);
  } finally {
    console.log(i, "runtime", Date.now() - start);
    clearTimeout(timeout);
  }
};

const input = Array.from({ length: 10000 }, (_, i) => i);

Promise.all(input.map(i => limit(() => run(i))));

使用 HTTP/2 ：
- 原理： HTTP/2 支持多路复用，多个请求可以通过同一个连接并行发送，减少连接建立的开销, 避免了队头阻塞问题。
- 实现： 如果服务器支持 HTTP/2，fetch 通常会自动使用。可以检查响应头来确认是否使用了 HTTP/2。

复用连接（Keep-Alive） ：

原理： HTTP/1.1 默认启用 Keep-Alive。复用已建立的连接,避免频繁建立和关闭连接的开销。
实现： 确保服务器和客户端都启用了 Keep-Alive。 Node.js 中，可以使用 http.Agent 或 https.Agent 并配置 keepAlive: true。

import https from 'https';

const agent = new https.Agent({ keepAlive: true, maxSockets: 100 }); // maxSockets 限制最大复用数

  const run = async (i) => {
    const start = Date.now();
    const abortController = new AbortController();
    const timeout = setTimeout(() => {
      abortController.abort();
    }, 2000);
    try {
       const v = await( await fetch("https://slowendpoint.com", {
        method: "POST",
        body: JSON.stringify([{}]),
        signal: abortController.signal,
        agent: agent // 使用自定义 agent
      })).text()
    } catch (e) {
      console.error(e);
    } finally {
      console.log(i, 'runtime', Date.now() - start);
      clearTimeout(timeout)
    }
  };

    for (let i = 0; i < 10000; i++) {
    run(i)
  }

优化 DNS 解析 ：

原理 : DNS 预解析可以提前解析域名，减少后续请求的 DNS 查询时间.
实现 : 使用 dns.lookup 方法预解析域名,将结果缓存.

import dns from 'dns';
import util from 'util';

const lookup = util.promisify(dns.lookup);
const dnsCache = new Map();

async function prefetchDNS(hostname) {
   try{
        const { address } = await lookup(hostname);
        dnsCache.set(hostname, address);
        console.log(`Prefetched DNS for ${hostname}: ${address}`);
   }catch(err){
      console.error(`Failed to prefetch DNS for ${hostname}`, err);
   }

}

async function run(i){
      const hostname = new URL("https://slowendpoint.com").hostname;
      if (!dnsCache.has(hostname)) {
          await prefetchDNS(hostname);
      }
      //其他不变...使用fetch时候, 可以使用dnsCache中的结果来构建options
       const start = Date.now();
        const abortController = new AbortController();
        const timeout = setTimeout(() => {
          abortController.abort();
        }, 2000);
        try {
            const options =  {
                method: "POST",
                body: JSON.stringify([{}]),
                signal: abortController.signal,
              };
          if (dnsCache.has(hostname)) {
                options.lookup = (hostname, options, callback) => {
                   callback(null, dnsCache.get(hostname), 4);  //假设为IPv4
              };

          }
         const v = await( await fetch("https://slowendpoint.com",options )).text()
        } catch (e) {
          console.error(e);
        } finally {
          console.log(i, 'runtime', Date.now() - start);
          clearTimeout(timeout)
        }
}

for (let i = 0; i < 10000; i++) {
      run(i)
    }

安全建议: 信任你使用的DNS服务器, 谨防DNS污染.

分批次请求
- 原理 : 将大量请求分成多个较小的批次,每批次之间设置短暂的间隔.
- 实现: 使用循环和 setTimeout或setInterval实现批处理

async function runBatch(start, end) {
  for (let i = start; i < end; i++) {
    await run(i);
  }
}

async function runInBatches(total, batchSize, delay) {
    for(let i =0; i< total; i+= batchSize){
      const start = i;
      const end = Math.min(i+batchSize, total);
      await runBatch(start,end)
      await new Promise(resolve => setTimeout(resolve,delay))//每批之间暂停一下.
    }
}

runInBatches(10000,100,500)// 总共 10000 个请求, 每批 100个,每批间延迟500ms

async function run(i){
  const start = Date.now();
    const abortController = new AbortController();
    const timeout = setTimeout(() => {
      abortController.abort();
    }, 2000);
    try {

     const v = await( await fetch("https://slowendpoint.com", {
        method: "POST",
        body: JSON.stringify([{}]),
        signal: abortController.signal,

      })).text()
    } catch (e) {
      console.error(e);
    } finally {
      console.log(i, 'runtime', Date.now() - start);
      clearTimeout(timeout)
    }
}

优化服务器端 :
- 如果 slowendpoint.com 是你控制的, 检查服务器的性能瓶颈, 比如:
  - 数据库查询是否高效。
  - 是否存在耗时的计算操作。
  - 是否合理使用了缓存。
  - 服务器硬件资源 (CPU, 内存, 带宽) 是否足够.
客户端和服务端的时间同步:
- 原理: 如果客户端和服务器的时间不同步, 可能会导致超时判断不准确.
- 实现: 使用 NTP (Network Time Protocol) 来同步客户端和服务器的时间。大多数操作系统都内置了 NTP 客户端。
调整 AbortController 的超时 :
- 虽然你的超时设置的是2000ms，但这只是 abort 信号的触发时间, fetch操作本身可能因为上述多种原因需要更长的时间. 如果是因为网络或者服务端的短暂问题, 适当调长AbortController的timeout值也许就能解决超时问题,避免请求被误伤。

进阶使用技巧

连接池监控: 使用 Node.js 的 http.globalAgent 或 https.globalAgent (或自定义的 agent) 提供的事件 ('request', 'connect', 'close', 等) 来监控连接池的状态，更好地了解连接的创建、复用和关闭情况.
自定义 DNS 解析 : Node.js 的 dns 模块提供了更底层的 DNS 解析功能。你可以实现自己的 DNS 解析逻辑,例如实现自定义的负载均衡策略.
使用工具进行详细的性能分析，如：tcpdump, Wireshark 分析网络数据包； Node.js 内置的 Profiler 或 clinic 这样的工具分析 CPU 使用情况.