All posts by dotte

JWT implementation with Refresh Token in Node.js example | MongoDB

In previous post, we’ve known how to build Token based Authentication & Authorization with Node.js, JWT and MongoDB. This tutorial will continue to make JWT Refresh Token in the Node.js Express Application. You can know how to expire the JWT, then renew the Access Token with Refresh Token.

Related Posts:
– Node.js, Express & MongoDb: Build a CRUD Rest Api example
– How to upload/store images in MongoDB using Node.js, Express & Multer
– Using MySQL/PostgreSQL instead: JWT Refresh Token implementation in Node.js example

Associations:
– MongoDB One-to-One relationship with Mongoose example
– MongoDB One-to-Many Relationship tutorial with Mongoose examples
– MongoDB Many-to-Many Relationship with Mongoose examples

The code in this post bases on previous article that you need to read first:
Node.js + MongoDB: User Authentication & Authorization with JWT

Overview of JWT Refresh Token with Node.js example

We already have a Node.js Express & MongoDB application in that:

  • User can signup new account, or login with username & password.
  • By User’s role (admin, moderator, user), we authorize the User to access resources

With APIs:

Methods Urls Actions
POST /api/auth/signup signup new account
POST /api/auth/signin login an account
GET /api/test/all retrieve public content
GET /api/test/user access User’s content
GET /api/test/mod access Moderator’s content
GET /api/test/admin access Admin’s content

For more details, please visit this post.

We’re gonna add Token Refresh to this Node.js & JWT Project.
The final result can be described with following requests/responses:

– Send /signin request, return response with refreshToken.

jwt-refresh-token-node-js-example-mongodb-signin

– Access resource successfully with accessToken.

jwt-refresh-token-node-js-example-mongodb-access-resource

– When the accessToken is expired, user cannot use it anymore.

jwt-refresh-token-node-js-example-mongodb-expire-token

– Send /refreshtoken request, return response with new accessToken.

jwt-refresh-token-node-js-example-mongodb-send-token-refresh-request

– Access resource successfully with new accessToken.

jwt-refresh-token-node-js-example-mongodb-new-token-access-resource

– Send an expired Refresh Token.

jwt-refresh-token-node-js-example-mongodb-expire-refresh-token

– Send an inexistent Refresh Token.

jwt-refresh-token-node-js-example-mongodb-token-not-exist-database

– Axios Client to check this: Axios Interceptors tutorial with Refresh Token example

– Using React Client:

– Or Vue Client:

– Angular Client:

Flow for JWT Refresh Token implementation

The diagram shows flow of how we implement Authentication process with Access Token and Refresh Token.

jwt-refresh-token-node-js-example-flow

– A legal JWT must be added to HTTP Header if Client accesses protected resources.
– A refreshToken will be provided at the time user signs in.

How to Expire JWT Token in Node.js

The Refresh Token has different value and expiration time to the Access Token.
Regularly we configure the expiration time of Refresh Token longer than Access Token’s.

Open config/auth.config.js:

module.exports = {
  secret: "bezkoder-secret-key",
  jwtExpiration: 3600,           // 1 hour
  jwtRefreshExpiration: 86400,   // 24 hours

  /* for test */
  // jwtExpiration: 60,          // 1 minute
  // jwtRefreshExpiration: 120,  // 2 minutes
};

Update middlewares/authJwt.js file to catch TokenExpiredError in verifyToken() function.

const jwt = require("jsonwebtoken");
const config = require("../config/auth.config");
const db = require("../models");
...
const { TokenExpiredError } = jwt;

const catchError = (err, res) => {
  if (err instanceof TokenExpiredError) {
    return res.status(401).send({ message: "Unauthorized! Access Token was expired!" });
  }

  return res.sendStatus(401).send({ message: "Unauthorized!" });
}

const verifyToken = (req, res, next) => {
  let token = req.headers["x-access-token"];

  if (!token) {
    return res.status(403).send({ message: "No token provided!" });
  }

  jwt.verify(token, config.secret, (err, decoded) => {
    if (err) {
      return catchError(err, res);
    }
    req.userId = decoded.id;
    next();
  });
};

Create Refresh Token Model

This Mongoose model has one-to-one relationship with User model. It contains expiryDate field which value is set by adding config.jwtRefreshExpiration value above.

There are 2 static methods:

  • createToken: use uuid library for creating a random token and save new object into MongoDB database
  • verifyExpiration: compare expiryDate with current Date time to check the expiration
const mongoose = require("mongoose");
const config = require("../config/auth.config");
const { v4: uuidv4 } = require('uuid');

const RefreshTokenSchema = new mongoose.Schema({
  token: String,
  user: {
    type: mongoose.Schema.Types.ObjectId,
    ref: "User",
  },
  expiryDate: Date,
});

RefreshTokenSchema.statics.createToken = async function (user) {
  let expiredAt = new Date();

  expiredAt.setSeconds(
    expiredAt.getSeconds() + config.jwtRefreshExpiration
  );

  let _token = uuidv4();

  let _object = new this({
    token: _token,
    user: user._id,
    expiryDate: expiredAt.getTime(),
  });

  console.log(_object);

  let refreshToken = await _object.save();

  return refreshToken.token;
};

RefreshTokenSchema.statics.verifyExpiration = (token) => {
  return token.expiryDate.getTime() < new Date().getTime();
}

const RefreshToken = mongoose.model("RefreshToken", RefreshTokenSchema);

module.exports = RefreshToken;

Don’t forget to export this model in models/index.js:

const mongoose = require('mongoose');
mongoose.Promise = global.Promise;

const db = {};

db.mongoose = mongoose;

db.user = require("./user.model");
db.role = require("./role.model");
db.refreshToken = require("./refreshToken.model");

db.ROLES = ["user", "admin", "moderator"];

module.exports = db;

Node.js Express Rest API for JWT Refresh Token

Let’s update the payloads for our Rest APIs:
– Requests:

  • refreshToken }

– Responses:

  • Signin Response: { accessToken, refreshToken, id, username, email, roles }
  • Message Response: { message }
  • RefreshToken Response: { new accessTokenrefreshToken }

In the Auth Controller, we:

  • update the method for /signin endpoint with Refresh Token
  • expose the POST API for creating new Access Token from received Refresh Token

controllers/auth.controller.js

const config = require("../config/auth.config");
const db = require("../models");
const { user: User, role: Role, refreshToken: RefreshToken } = db;

const jwt = require("jsonwebtoken");
const bcrypt = require("bcryptjs");

...
exports.signin = (req, res) => {
  User.findOne({
    username: req.body.username,
  })
    .populate("roles", "-__v")
    .exec(async (err, user) => {
      if (err) {
        res.status(500).send({ message: err });
        return;
      }

      if (!user) {
        return res.status(404).send({ message: "User Not found." });
      }

      let passwordIsValid = bcrypt.compareSync(
        req.body.password,
        user.password
      );

      if (!passwordIsValid) {
        return res.status(401).send({
          accessToken: null,
          message: "Invalid Password!",
        });
      }

      let token = jwt.sign({ id: user.id }, config.secret, {
        expiresIn: config.jwtExpiration,
      });

      let refreshToken = await RefreshToken.createToken(user);

      let authorities = [];

      for (let i = 0; i < user.roles.length; i++) {
        authorities.push("ROLE_" + user.roles[i].name.toUpperCase());
      }
      res.status(200).send({
        id: user._id,
        username: user.username,
        email: user.email,
        roles: authorities,
        accessToken: token,
        refreshToken: refreshToken,
      });
    });
};

exports.refreshToken = async (req, res) => {
  const { refreshToken: requestToken } = req.body;

  if (requestToken == null) {
    return res.status(403).json({ message: "Refresh Token is required!" });
  }

  try {
    let refreshToken = await RefreshToken.findOne({ token: requestToken });

    if (!refreshToken) {
      res.status(403).json({ message: "Refresh token is not in database!" });
      return;
    }

    if (RefreshToken.verifyExpiration(refreshToken)) {
      RefreshToken.findByIdAndRemove(refreshToken._id, { useFindAndModify: false }).exec();
      
      res.status(403).json({
        message: "Refresh token was expired. Please make a new signin request",
      });
      return;
    }

    let newAccessToken = jwt.sign({ id: refreshToken.user._id }, config.secret, {
      expiresIn: config.jwtExpiration,
    });

    return res.status(200).json({
      accessToken: newAccessToken,
      refreshToken: refreshToken.token,
    });
  } catch (err) {
    return res.status(500).send({ message: err });
  }
};

In refreshToken() function:

  • Firstly, we get the Refresh Token from request data
  • Next, get the RefreshToken object {idusertokenexpiryDate} from raw Token using RefreshToken model static method
  • We verify the token (expired or not) basing on expiryDate field. If the Refresh Token was expired, remove it from MongoDB database and return message
  • Continue to use user _id field of RefreshToken object as parameter to generate new Access Token using jsonwebtoken library
  • Return { new accessTokenrefreshToken } if everything is done
  • Or else, send error message

Define Route for JWT Refresh Token API

Finally, we need to determine how the server with an endpoint will response by setting up the routes.
In routes/auth.routes.js, add one line of code:

...
const controller = require("../controllers/auth.controller");

module.exports = function(app) {
  ...
  app.post("/api/auth/refreshtoken", controller.refreshToken);
};

Conclusion

Today we’ve learned JWT Refresh Token implementation in just a Node.js example using Express Rest Api and MongoDB. You also know how to expire the JWT Token and renew the Access Token.

The code in this post bases on previous article that you need to read first:
Node.js + MongoDB: User Authentication & Authorization with JWT

If you want to use MySQL/PostgreSQL instead, please visit:
JWT Refresh Token implementation in Node.js example

You can test this Rest API with:
– Axios Client: Axios Interceptors tutorial with Refresh Token example
– React Client:

– Vue Client:

– Angular Client:

Happy learning! See you again.

Further Reading

Fullstack CRUD application:
– MEVN: Vue.js + Node.js + Express + MongoDB example
– MEAN:
Angular 8 + Node.js + Express + MongoDB example
Angular 10 + Node.js + Express + MongoDB example
Angular 11 + Node.js + Express + MongoDB example
Angular 12 + Node.js + Express + MongoDB example
– MERN: React + Node.js + Express + MongoDB example

Source Code

You can find the complete source code for this tutorial on Github.

from:JWT implementation with Refresh Token in Node.js example | MongoDB – BezKoder

消息幂等(去重)通用解决方案

消息中间件是分布式系统常用的组件,无论是异步化、解耦、削峰等都有广泛的应用价值。我们通常会认为,消息中间件是一个可靠的组件——这里所谓的可靠是指,只要我把消息成功投递到了消息中间件,消息就不会丢失,即消息肯定会至少保证消息能被消费者成功消费一次,这是消息中间件最基本的特性之一,也就是我们常说的“AT LEAST ONCE”,即消息至少会被“成功消费一遍”。

举个例子,一个消息M发送到了消息中间件,消息投递到了消费程序A,A接受到了消息,然后进行消费,但在消费到一半的时候程序重启了,这时候这个消息并没有标记为消费成功,这个消息还会继续投递给这个消费者,直到其消费成功了,消息中间件才会停止投递。

然而这种可靠的特性导致,消息可能被多次地投递。举个例子,还是刚刚这个例子,程序A接受到这个消息M并完成消费逻辑之后,正想通知消息中间件“我已经消费成功了”的时候,程序就重启了,那么对于消息中间件来说,这个消息并没有成功消费过,所以他还会继续投递。这时候对于应用程序A来说,看起来就是这个消息明明消费成功了,但是消息中间件还在重复投递。

这在RockectMQ的场景来看,就是同一个messageId的消息重复投递下来了。

基于消息的投递可靠(消息不丢)是优先级更高的,所以消息不重的任务就会转移到应用程序自我实现,这也是为什么RocketMQ的文档里强调的,消费逻辑需要自我实现幂等。背后的逻辑其实就是:不丢和不重是矛盾的(在分布式场景下),但消息重复是有解决方案的,而消息丢失是很麻烦的。

简单的消息去重解决方案

例如:假设我们业务的消息消费逻辑是:插入某张订单表的数据,然后更新库存:

1
2
insert into t_order values .....
update t_inv set count = count-1 where good_id = 'good123';

要实现消息的幂等,我们可能会采取这样的方案:

1
2
3
4
5
6
7
select * from t_order where order_no = 'order123'

if(order  != null) {

    return ;//消息重复,直接返回

}

这对于很多情况下,的确能起到不错的效果,但是在并发场景下,还是会有问题。

并发重复消息

假设这个消费的所有代码加起来需要1秒,有重复的消息在这1秒内(假设100毫秒)内到达(例如生产者快速重发,Broker重启等),那么很可能,上面去重代码里面会发现,数据依然是空的(因为上一条消息还没消费完,还没成功更新订单状态),

那么就会穿透掉检查的挡板,最后导致重复的消息消费逻辑进入到非幂等安全的业务代码中,从而引发重复消费的问题(如主键冲突抛出异常、库存被重复扣减而没释放等)

并发去重的解决方案之一

要解决上面并发场景下的消息幂等问题,一个可取的方案是开启事务把select 改成 select for update语句,把记录进行锁定。

1
2
3
4
select * from t_order where order_no = 'THIS_ORDER_NO' for update  //开启事务
if(order.status != null) {
    return ;//消息重复,直接返回
}

但这样消费的逻辑会因为引入了事务包裹而导致整个消息消费可能变长,并发度下降。

当然还有其他更高级的解决方案,例如更新订单状态采取乐观锁,更新失败则消息重新消费之类的。但这需要针对具体业务场景做更复杂和细致的代码开发、库表设计,不在本文讨论的范围。

但无论是select for update, 还是乐观锁这种解决方案,实际上都是基于业务表本身做去重,这无疑增加了业务开发的复杂度, 一个业务系统里面很大部分的请求处理都是依赖MQ的,如果每个消费逻辑本身都需要基于业务本身而做去重/幂等的开发的话,这是繁琐的工作量。本文希望探索出一个通用的消息幂等处理的方法,从而抽象出一定的工具类用以适用各个业务场景。

Exactly Once

在消息中间件里,有一个投递语义的概念,而这个语义里有一个叫”Exactly Once”,即消息肯定会被成功消费,并且只会被消费一次。以下是阿里云里对Exactly Once的解释:

Exactly-Once 是指发送到消息系统的消息只能被消费端处理且仅处理一次,即使生产端重试消息发送导致某消息重复投递,该消息在消费端也只被消费一次。

在我们业务消息幂等处理的领域内,可以认为业务消息的代码肯定会被执行,并且只被执行一次,那么我们可以认为是Exactly Once。

但这在分布式的场景下想找一个通用的方案几乎是不可能的。不过如果是针对基于数据库事务的消费逻辑,实际上是可行的。

基于关系数据库事务插入消息表

假设我们业务的消息消费逻辑是:更新MySQL数据库的某张订单表的状态:

1
update t_order set status = 'SUCCESS' where order_no= 'order123';

要实现Exaclty Once即这个消息只被消费一次(并且肯定要保证能消费一次),我们可以这样做:在这个数据库中增加一个消息消费记录表,把消息插入到这个表,并且把原来的订单更新和这个插入的动作放到同一个事务中一起提交,就能保证消息只会被消费一遍了。

  1. 开启事务
  2. 插入消息表(处理好主键冲突的问题)
  3. 更新订单表(原消费逻辑)
  4. 提交事务

说明:

  1. 这时候如果消息消费成功并且事务提交了,那么消息表就插入成功了,这时候就算RocketMQ还没有收到消费位点的更新再次投递,也会插入消息失败而视为已经消费过,后续就直接更新消费位点了。这保证我们消费代码只会执行一次。
  2. 如果事务提交之前服务挂了(例如重启),对于本地事务并没有执行所以订单没有更新,消息表也没插入成功;而对于RocketMQ服务端来说,消费位点也没更新,所以消息还会继续投递下来,投递下来发现这个消息插入消息表也是成功的,所以可以继续消费。这保证了消息不丢失。

事实上,阿里云ONS的EXACTLY-ONCE语义的实现上,就是类似这个方案基于数据库的事务特性实现的。更多详情可参考:https://help.aliyun.com/document_detail/102777.html

基于这种方式,的确这是有能力拓展到不同的应用场景,因为他的实现方案与具体业务本身无关——而是依赖一个消息表。

但是这里有它的局限性

  1. 消息的消费逻辑必须是依赖于关系型数据库事务。如果消费的消费过程中还涉及其他数据的修改,例如Redis这种不支持事务特性的数据源,则这些数据是不可回滚的。
  2. 数据库的数据必须是在一个库,跨库无法解决

注:业务上,消息表的设计不应该以消息ID作为标识,而应该以业务的业务主键作为标识更为合理,以应对生产者的重发。阿里云上的消息去重只是RocketMQ的messageId,在生产者因为某些原因手动重发(例如上游针对一个交易重复请求了)的场景下起不到去重/幂等的效果(因消息id不同)。

更复杂的业务场景

如上所述,这种方式Exactly Once语义的实现,实际上有很多局限性,这种局限性使得这个方案基本不具备广泛应用的价值。并且由于基于事务,可能导致锁表时间过长等性能问题。

例如我们以一个比较常见的一个订单申请的消息来举例,可能有以下几步(以下统称为步骤X):

  1. 检查库存(RPC)
  2. 锁库存(RPC)
  3. 开启事务,插入订单表(MySQL)
  4. 调用某些其他下游服务(RPC)
  5. 更新订单状态
  6. commit 事务(MySQL)

这种情况下,我们如果采取消息表+本地事务的实现方式,消息消费过程中很多子过程是不支持回滚的,也就是说就算我们加了事务,实际上这背后的操作并不是原子性的。怎么说呢,就是说有可能第一条小在经历了第二步锁库存的时候,服务重启了,这时候实际上库存是已经在另外的服务里被锁定了,这并不能被回滚。当然消息还会再次投递下来,要保证消息能至少消费一遍,换句话说,锁库存的这个RPC接口本身依旧要支持“幂等”。

再者,如果在这个比较耗时的长链条场景下加入事务的包裹,将大大的降低系统的并发。所以通常情况下,我们处理这种场景的消息去重的方法还是会使用一开始说的业务自己实现去重逻辑的方式,如前面加select for update,或者使用乐观锁。

那我们有没有方法抽取出一个公共的解决方案,能兼顾去重、通用、高性能呢?

拆解消息执行过程

其中一个思路是把上面的几步,拆解成几个不同的子消息,例如:

  1. 库存系统消费A:检查库存并做锁库存,发送消息B给订单服务
  2. 订单系统消费消息B:插入订单表(MySQL),发送消息C给自己(下游系统)消费
  3. 下游系统消费消息C:处理部分逻辑,发送消息D给订单系统
  4. 订单系统消费消息D:更新订单状态

注:上述步骤需要保证本地事务和消息是一个事务的(至少是最终一致性的),这其中涉及到分布式事务消息相关的话题,不在本文论述。

可以看到这样的处理方法会使得每一步的操作都比较原子,而原子则意味着是小事务,小事务则意味着使用消息表+事务的方案显得可行。

然而,这太复杂了!这把一个本来连续的代码逻辑割裂成多个系统多次消息交互!那还不如业务代码层面上加锁实现呢。

更通用的解决方案

上面消息表+本地事务的方案之所以有其局限性和并发的短板,究其根本是因为它依赖于关系型数据库的事务,且必须要把事务包裹于整个消息消费的环节。

如果我们能不依赖事务而实现消息的去重,那么方案就能推广到更复杂的场景例如:RPC、跨库等。

例如,我们依旧使用消息表,但是不依赖事务,而是针对消息表增加消费状态,是否可以解决问题呢?

基于消息幂等表的非事务方案

dedup-solution-01

以上是去事务化后的消息幂等方案的流程,可以看到,此方案是无事务的,而是针对消息表本身做了状态的区分:消费中、消费完成。只有消费完成的消息才会被幂等处理掉。而对于已有消费中的消息,后面重复的消息会触发延迟消费(在RocketMQ的场景下即发送到RETRY TOPIC),之所以触发延迟消费是为了控制并发场景下,第二条消息在第一条消息没完成的过程中,去控制消息不丢(如果直接幂等,那么会丢失消息(同一个消息id的话),因为上一条消息如果没有消费完成的时候,第二条消息你已经告诉broker成功了,那么第一条消息这时候失败broker也不会重新投递了)

上面的流程不再细说,后文有github源码的地址,读者可以参考源码的实现,这里我们回头看看我们一开始想解决的问题是否解决了:

  1. 消息已经消费成功了,第二条消息将被直接幂等处理掉(消费成功)。
  2. 并发场景下的消息,依旧能满足不会出现消息重复,即穿透幂等挡板的问题。
  3. 支持上游业务生产者重发的业务重复的消息幂等问题。

关于第一个问题已经很明显已经解决了,在此就不讨论了。

关于第二个问题是如何解决的?主要是依靠插入消息表的这个动作做控制的,假设我们用MySQL作为消息表的存储媒介(设置消息的唯一ID为主键),那么插入的动作只有一条消息会成功,后面的消息插入会由于主键冲突而失败,走向延迟消费的分支,然后后面延迟消费的时候就会变成上面第一个场景的问题。

关于第三个问题,只要我们设计去重的消息键让其支持业务的主键(例如订单号、请求流水号等),而不仅仅是messageId即可。所以也不是问题。

此方案是否有消息丢失的风险?

如果细心的读者可能会发现这里实际上是有逻辑漏洞的,问题出在上面聊到的个三问题中的第2个问题(并发场景),在并发场景下我们依赖于消息状态是做并发控制使得第2条消息重复的消息会不断延迟消费(重试)。但如果这时候第1条消息也由于一些异常原因(例如机器重启了、外部异常导致消费失败)没有成功消费成功呢?也就是说这时候延迟消费实际上每次下来看到的都是消费中的状态,最后消费就会被视为消费失败而被投递到死信Topic中(RocketMQ默认可以重复消费16次)。

有这种顾虑是正确的!对于此,我们解决的方法是,插入的消息表必须要带一个最长消费过期时间,例如10分钟,意思是如果一个消息处于消费中超过10分钟,就需要从消息表中删除(需要程序自行实现)。所以最后这个消息的流程会是这样的:

dedup-solution-01

更灵活的消息表存储媒介

我们这个方案实际上没有事务的,只需要一个存储的中心媒介,那么自然我们可以选择更灵活的存储媒介,例如Redis。使用Redis有两个好处:

  1. 性能上损耗更低
  2. 上面我们讲到的超时时间可以直接利用Redis本身的ttl实现

当然Redis存储的数据可靠性、一致性等方面是不如MySQL的,需要用户自己取舍。

源码:RocketMQDedupListener

以上方案针对RocketMQ的Java实现已经开源放到Github中,具体的使用文档可以参考https://github.com/Jaskey/RocketMQDedupListener ,

以下仅贴一个Readme中利用Redis去重的使用样例,用以意业务中如果使用此工具加入消息去重幂等的是多么简单:

1
2
3
4
5
6
7
8
9
10
11
        //利用Redis做幂等表
        DefaultMQPushConsumer consumer = new DefaultMQPushConsumer("TEST-APP1");
        consumer.subscribe("TEST-TOPIC", "*");

        String appName = consumer.getConsumerGroup();// 大部分情况下可直接使用consumer group名
        StringRedisTemplate stringRedisTemplate = null;// 这里省略获取StringRedisTemplate的过程
        DedupConfig dedupConfig = DedupConfig.enableDedupConsumeConfig(appName, stringRedisTemplate);
        DedupConcurrentListener messageListener = new SampleListener(dedupConfig);

        consumer.registerMessageListener(messageListener);
        consumer.start();

以上代码大部分是原始RocketMQ的必须代码,唯一需要修改的仅仅是创建一个DedupConcurrentListener示例,在这个示例中指明你的消费逻辑和去重的业务键(默认是messageId)。

更多使用详情请参考Github上的说明。

这种实现是否一劳永逸?

实现到这里,似乎方案挺完美的,所有的消息都能快速的接入去重,且与具体业务实现也完全解耦。那么这样是否就完美的完成去重的所有任务呢?

很可惜,其实不是的。原因很简单:因为要保证消息至少被成功消费一遍,那么消息就有机会消费到一半的时候失败触发消息重试的可能。还是以上面的订单流程X:

  1. 检查库存(RPC)
  2. 锁库存(RPC)
  3. 开启事务,插入订单表(MySQL)
  4. 调用某些其他下游服务(RPC)
  5. 更新订单状态
  6. commit 事务(MySQL)

当消息消费到步骤3的时候,我们假设MySQL异常导致失败了,触发消息重试。因为在重试前我们会删除幂等表的记录,所以消息重试的时候就会重新进入消费代码,那么步骤1和步骤2就会重新再执行一遍。如果步骤2本身不是幂等的,那么这个业务消息消费依旧没有做好完整的幂等处理。

本实现方式的价值?

那么既然这个并不能完整的完成消息幂等,还有什么价值呢?价值可就大了!虽然这不是解决消息幂等的银弹(事实上,软件工程领域里基本没有银弹),但是他能以便捷的手段解决:

1.各种由于Broker、负载均衡等原因导致的消息重投递的重复问题

2.各种上游生产者导致的业务级别消息重复问题

3.重复消息并发消费的控制窗口问题,就算重复,重复也不可能同一时间进入消费逻辑

一些其他的消息去重的建议

也就是说,使用这个方法能保证正常的消费逻辑场景下(无异常,无异常退出),消息的幂等工作全部都能解决,无论是业务重复,还是rocketmq特性带来的重复。

事实上,这已经能解决99%的消息重复问题了,毕竟异常的场景肯定是少数的。那么如果希望异常场景下也能处理好幂等的问题,可以做以下工作降低问题率:

  1. 消息消费失败做好回滚处理。如果消息消费失败本身是带回滚机制的,那么消息重试自然就没有副作用了。
  2. 消费者做好优雅退出处理。这是为了尽可能避免消息消费到一半程序退出导致的消息重试。
  3. 一些无法做到幂等的操作,至少要做到终止消费并告警。例如锁库存的操作,如果统一的业务流水锁成功了一次库存,再触发锁库存,如果做不到幂等的处理,至少要做到消息消费触发异常(例如主键冲突导致消费异常等)
  4. 在#3做好的前提下,做好消息的消费监控,发现消息重试不断失败的时候,手动做好#1的回滚,使得下次重试消费成功。

Teach Yourself Node.js in 10 Steps

Lets start talking about modularity, to grasp the differences between Node and code in the browser. I prepared a special something you can use as you read this article. You can find every example in this article on GitHub nicely packed for you to start playing right away.

Modularity in Node

Node implements CommonJS Modules/1.1, which allow you to keep files self-contained. You can learn all about Node Modules from their increasingly useful documentation.

Modules can expose an API through the module.exports convention.

// modules/math.js

var api = {
    sum: function(a, b){
        return a + b;
    }
};

module.exports = api;

Note the variable api won’t make it’s way to the global object. You can learn why from the docs. Globals work differently in Node. The top-level scope of a module is local to that module, but you can still access a few globals on your own, such as process, and console. Setting up your own globals on the global object is discouraged.

Consequently, module isn’t a global, but rather a local variable, private to the module we are currently working on.

Modules can be referenced using the require function. You can provide a package name (more on that later), or a physical path relative to the file you are invoking require from.

// modules/app.js

var math = require('./math.js');
var result = math.sum(1, 2);

console.log('I can\'t believe the result is:', result, '!');

Both files being on the same directory is assumed. The math variable will now equal the value of module.exports in math.js.

What does console.log do? It’s actually just syntactic sugar for process.stdout.write, and it will append a new line \n at the end. This is deliberately done all over Node to help ease your on-boarding onto the platform by leveraging the conventions and objects you are already used to from your experience in writing client-side JavaScript.

Sidebar. You might want to read the actual console API documentation.

Note that requiring a file multiple times in the same process will only execute the code in the module once. So if you were to require the app.js module several times, it would still be executed a single time. As a result, the output would only be buffered once.

// modules/several.js

require('./app.js');
require('./app.js');
require('./app.js');

Those are modules all right, but where’s the asynchronicity Node is supposedly so popular for?

Asynchronous Convention

Node is an event-based language, and most of the code written for Node follows a really simple convention that helps modules look inspiringly similar to each other, as far as coding conventions go.

Our math module would probably look more like this if we wanted to play nice with the Node community at large.

// async/math.js

var api = {
    sum: function(a, b, done){
        process.nextTick(function(){
            done(null, a + b);
        });
    }
};

module.exports = api;

process.nextTick is kind of hard to wrap our head around at first, but lets just imagine it’s setTimeout(fn,0), which we might have used while trying hacky fixes in the browser.

I’ve used process.nextTick to turn an otherwise synchronous function into an asynchronous one. When we are done processing, we’re going to pass the result as the second parameter of the done callback. The first parameter should always be err, if an error occurs, we are passing that as the first parameter, rather than throwing an exception. If no error occurs, we are fine passing any falsy value.

Consuming this module is still really easy.

// async/app.js

var math = require('./math.js');

math.sum(1, 2, function(err, result){
    if(err){
        throw err;
    }
    
    console.log('I can\'t believe the result is:', result, '!');
});

We are now waiting reactively for the sum function to let us know when it’s done. This is the basest of asynchronous examples in Node.js. Note how we changed modes, and use throw here; this is fine as long as we are in a synchronous path, throwing errors should always have the end result of process termination, so keep that in mind when you are dealing with this type of situations. This is acceptable for our console application, however in a web application we probably would prefer to just return an HTTP status code 500, internal server error, for the current request.

You’ll also have to consider the option of bubbling errors through multiple asynchronous calls. This, for example, might not be the best error-handling approach:

// async/wrong.js

var math = require('./math.js');

math.sum(1, 2, function(err, result){
    if(err){
        throw err;
    }

    math.sum(result, 3, function(err, result){
        if(err){
            throw err;
        }

        console.log('I can\'t believe the result is:', result, '!');
    });
});

A more sensible approach might be to avoid throwing errors all over the place, but handle those in a centralized location.

// async/better.js

var math = require('./math.js');

math.sum(1, 2, function(err, result){
    if(err){
        return then(err);
    }

    math.sum(result, 3, function(err, result){
        if(err){
            return then(err);
        }

        then(err, result);
    });
});

function then(err, result){
    if(err){
        throw err;
    }

    console.log('I can\'t believe the result is:', result, '!');
}

This is however, getting pretty verbose. Let me skip to a module for a bit, and then come back and explain what’s going on. We’re going to use the control flow module called async, to improve the readability of our code.

// async/right.js

var async = require('async');
var math = require('./math.js');

async.waterfall([
    function(next){
        math.sum(1, 2, next);
    },
    function(result, next){
        math.sum(result, 3, next);
    }
], then);

function then(err, result){
    if(err){
        throw err;
    }

    console.log('I can\'t believe the result is:', result, '!');
}

That’s a little better. Since we’ve been following the right conventions, we can use async, which allows us to get rid of all those pesky if(err) statements, and flatten our callback hell while we’re at it. waterfall‘s API is pretty simple, we give it an array of functions, and these will be called in series, when our first math.sum completes, it will invoke the next callback with the (null, 3) arguments. If a function returns a truthy value in the first parameter, this will shortcut the waterfall and immediatly jumping to the then function, passing the error argument, still in the first position. If no error occurs, then the next function in the sequence is executed, passing any resulting arguments to it (in this case, just the 3).

This is the recommended way of doing things because it flattens the structure of our code, turning our codebase into something more readable, while at the same time following the same conventions and using the same API that is used everywhere else. You must check out the async module and its comprehensive API, toy with it for a while.

That’s great and all, but where did async come from? It sure as hell isn’t part of Node.

I’m glad you asked.

Node Packaged Modules

npm is a small treasure that comes bundled with Node, and helps you manage dependencies in your projects. There is a huge repository you can search, and most people include installation instructions in their GitHub repositories. Ultimately, npm is a CLI (command-line interface) tool.

If you have been following the instructions of the learn-nodejs repository I provided, then you already have dependencies installed in your project folder. If not, just run the following command in your terminal.

$ npm install

That’s it, now you have everything you need. How does that work? Some weird magic? No, just package.json. This file helps us define the dependencies in our project. When you ran npm install in your terminal, all it did was install the dependencies listed in the package.json file.

{
  "name": "learn-nodejs",
  "description": "Simple NodeJS Application Examples",
  "homepage": "https://github.com/ponyfoo/learn-nodejs",
  "author": {
    "name": "Nicolas Bevacqua",
    "email": "nicolasbevacqua@gmail.com",
    "url": "http://www.ponyfoo.com"
  },
  "version": "0.0.1",
  "repository": {
    "type": "git",
    "url": "https://github.com/ponyfoo/learn-nodejs.gitt"
  },
  "dependencies": {
    "async": "~0.2.9"
  }
}

I rarely add dependencies manually to this definition file, in the case of async, for example, all I did was run the following command:

$ npm install async --save

That’s it. async has been added it to the dependencies object. Installing a module basically just fetches it, and adds it to a node_modules folder, which you should always exclude in your .gitignore settings.

If you are interested in developing your own npm module, you’ll be shocked to learn how simple that is.

Before we jump into building a decent application, lets look at one of the most powerful constructs in Node.

Events API

Yes! Of course, I was talking about the event emitter API. What are events? Well, the documentation explains it like this:

Many objects in Node emit events: a net.Server emits an event each time a peer connects to it, a fs.readStream emits an event when the file is opened. All objects which emit events are instances of events.EventEmitter. You can access this module by doing: require('events');

Functions can then be attached to objects, to be executed when an event is emitted. These functions are called listeners. Inside a listener function, this refers to the EventEmitter that the listener was attached to.

Lets write our own event emitter and explain a few things along the way. Then, we’ll see how it can be used.

// events/implementation.js

var util = require('util');
var EventEmitter = require('events').EventEmitter;

function Heartbeat(interval){
    EventEmitter.call(this);

    var emitter = this;
    var beats = 0;

    setInterval(function(){
        emitter.emit('beat', ++beats);
    }, interval);
}

util.inherits(Heartbeat, EventEmitter);

module.exports = Heartbeat;

Don’t laugh, that’s the best I could come up with. Here I’m simply creating a constructor function for my custom EventEmitter implementation. I’m using util.inherits, as it’s the recommended way of performing prototypal inheritance in Node applications.

Whenever our emitter .emits an event, all subscribers to that event will be notified, and receive the arguments which where provided when the event was emitted.

Remember what I mentioned about leveraging your API knowledge about the browser with that in Node? setInterval is one of those cases.

Fine, how do we use our newly born event emitter? It’s simple, really:

// events/usage.js

var Heartbeat = require('./implementation.js');
var a = new Heartbeat(400);
var b = new Heartbeat(1000);

a.on('beat', function(beats){
    console.log('Heart A beat n times:', beats);
});

b.on('beat', function(beats){
    console.log('Heart B beat n times:', beats);
});

I’m not even sure I need to explain this, but whenever the emitter invokes .emit, every listener for that event, added by .on, will have its callback triggered. This seemingly innocent API powers a lot of what Node does.

One last thing, read this quote from the documentation:

When an EventEmitter instance experiences an error, the typical action is to emit an 'error' event. Error events are treated as a special case in node. If there is no listener for it, then the default action is to print a stack trace and exit the program.

What this means is that if there is no .on('error', fn) listener, and your emitter emits an 'error' event, then your application will die a tragic death.

HTTP Server

Enough blabbering, here is an HTTP server in Node.

// http/server.js

var http = require('http');

http.createServer(function(req, res) {
    res.end('Hello Node', 200);
    console.log('I think I\'ve heard something!');
}).listen(8000);

console.log('Listening!');

That wasn’t so amusing, it was very simple and self-describing, though! Lets try something different, serving an HTML file from disk.

// http/html.js

var http = require('http');
var fs = require('fs');
var path = require('path');
var index = path.resolve(__dirname, './index.html');

http.createServer(function(req, res) {
    var stream = fs.createReadStream(index);

    stream.on('open', function(){
        res.writeHead(200, { 'Content-Type': 'text/html' });
    });

    stream.on('error', function(){
        res.writeHead(404);
        res.end();
    });

    stream.pipe(res);
}).listen(8000);

console.log('Listening!');

Couple of things. First of all, __dirname is a special local variable that contains the absolute path to the directory for our currently executing module. We just learned what events are, the fs.createReadStream method will provide us with an event emitter we can use to stream data to the response. The file will be piped straight into a chunked response, this can be achieved using the readable.pipe method. If the file isn’t found, the stream will emit an 'error' event; we can take advantage of that and respond with a 404 status code instead.

This is, however, a very convoluted thing to do to just serve a file. Enter Express.

Express Application Framework

Express is built on Connect, which expands on Node’s HTTP server. There’s also Socket.IO for implementing web socket communications, but I won’t be getting into realtime for now.

Connect just provides middleware, a nice abstraction over what the native HTTP module offers. Express builds on that, adding a lot of awesome features, and making your life more bearable. Here is a small sample application built on Express:

// http/express.js

var express = require('express');
var app = express();

app.get('/', function(req, res){
    res.send('hello world');
});

app.listen(8000);

The API is incredibly self-documenting, I wish more projects had an API as clean as Express does.

Enough already. You are mean for laughing at all of my stupid examples. You know what else is mean?

MongoDB, ExpressJS, AngularJS and NodeJS

The MEAN Stack is not a hipster thing, as delusional people try to assertain with no real reasoning behind their empty statements. The MEAN stack is a very real thing. Here’s a slideshare for you to look at.

The glaring benefit of using a stack such as this is the ease with which you can transfer objects through your application without having to resort to different interfaces, data presentation alternatives, and programming languages. You can really get away with just using JavaScript everywhere.

from:https://ponyfoo.com/articles/teach-yourself-nodejs-in-10-steps

Node.js Best Practices

Welcome! 3 Things You Ought To Know First

1. You are reading dozens of the best Node.js articles – this repository is a summary and curation of the top-ranked content on Node.js best practices, as well as content written here by collaborators

2. It is the largest compilation, and it is growing every week – currently, more than 80 best practices, style guides, and architectural tips are presented. New issues and pull requests are created every day to keep this live book updated. We’d love to see you contributing here, whether that is fixing code mistakes, helping with translations, or suggesting brilliant new ideas. See our writing guidelines here

3. Best practices have additional info – most bullets include a linkRead More link that expands on the practice with code examples, quotes from selected blogs, and more information

Table of Contents

  1. Project Structure Practices (5)
  2. Error Handling Practices (12)
  3. Code Style Practices (12)
  4. Testing And Overall Quality Practices (13)
  5. Going To Production Practices (19)
  6. Security Practices (25)
  7. Performance Practices (2) (Work In Progress️ writing_hand)
  8. Docker Practices (15)

 

1. Project Structure Practices

✔ 1.1 Structure your solution by components

TL;DR: The worst large applications pitfall is maintaining a huge code base with hundreds of dependencies – such a monolith slows down developers as they try to incorporate new features. Instead, partition your code into components, each gets its folder or a dedicated codebase, and ensure that each unit is kept small and simple. Visit ‘Read More’ below to see examples of correct project structure

Otherwise: When developers who code new features struggle to realize the impact of their change and fear to break other dependent components – deployments become slower and riskier. It’s also considered harder to scale-out when all the business units are not separated

link Read More: structure by components

 

✔ 1.2 Layer your components, keep the web layer within its boundaries

TL;DR: Each component should contain ‘layers’ – a dedicated object for the web, logic, and data access code. This not only draws a clean separation of concerns but also significantly eases mocking and testing the system. Though this is a very common pattern, API developers tend to mix layers by passing the web layer objects (e.g. Express req, res) to business logic and data layers – this makes your application dependent on and accessible only by specific web frameworks

Otherwise: App that mixes web objects with other layers cannot be accessed by testing code, CRON jobs, triggers from message queues, etc

link Read More: layer your app

 

✔ 1.3 Wrap common utilities as npm packages

TL;DR: In a large app that constitutes a large codebase, cross-cutting-concern utilities like a logger, encryption and alike, should be wrapped by your code and exposed as private npm packages. This allows sharing them among multiple codebases and projects

Otherwise: You’ll have to invent your deployment and the dependency wheel

link Read More: Structure by feature

 

✔ 1.4 Separate Express ‘app’ and ‘server’

TL;DR: Avoid the nasty habit of defining the entire Express app in a single huge file – separate your ‘Express’ definition to at least two files: the API declaration (app.js) and the networking concerns (WWW). For even better structure, locate your API declaration within components

Otherwise: Your API will be accessible for testing via HTTP calls only (slower and much harder to generate coverage reports). It probably won’t be a big pleasure to maintain hundreds of lines of code in a single file

link Read More: separate Express ‘app’ and ‘server’

 

✔ 1.5 Use environment aware, secure and hierarchical config

TL;DR: A perfect and flawless configuration setup should ensure (a) keys can be read from file AND from environment variable (b) secrets are kept outside committed code (c) config is hierarchical for easier findability. There are a few packages that can help tick most of those boxes like rcnconfconfig, and convict.

Otherwise: Failing to satisfy any of the config requirements will simply bog down the development or DevOps team. Probably both

link Read More: configuration best practices

 

arrow_up Return to top

2. Error Handling Practices

✔ 2.1 Use Async-Await or promises for async error handling

TL;DR: Handling async errors in callback style is probably the fastest way to hell (a.k.a the pyramid of doom). The best gift you can give to your code is using a reputable promise library or async-await instead which enables a much more compact and familiar code syntax like try-catch

Otherwise: Node.js callback style, function(err, response), is a promising way to un-maintainable code due to the mix of error handling with casual code, excessive nesting, and awkward coding patterns

link Read More: avoiding callbacks

 

✔ 2.2 Use only the built-in Error object

TL;DR: Many throw errors as a string or as some custom type – this complicates the error handling logic and the interoperability between modules. Whether you reject a promise, throw an exception or emit an error – using only the built-in Error object (or an object that extends the built-in Error object) will increase uniformity and prevent loss of information. There is no-throw-literal ESLint rule that strictly checks that (although it have some limitations which can be solved when using TypeScript and setting the @typescript-eslint/no-throw-literal rule)

Otherwise: When invoking some component, being uncertain which type of errors come in return – it makes proper error handling much harder. Even worse, using custom types to describe errors might lead to loss of critical error information like the stack trace!

link Read More: using the built-in error object

 

✔ 2.3 Distinguish operational vs programmer errors

TL;DR: Operational errors (e.g. API received an invalid input) refer to known cases where the error impact is fully understood and can be handled thoughtfully. On the other hand, programmer error (e.g. trying to read an undefined variable) refers to unknown code failures that dictate to gracefully restart the application

Otherwise: You may always restart the application when an error appears, but why let ~5000 online users down because of a minor, predicted, operational error? the opposite is also not ideal – keeping the application up when an unknown issue (programmer error) occurred might lead to an unpredicted behavior. Differentiating the two allows acting tactfully and applying a balanced approach based on the given context

link Read More: operational vs programmer error

 

✔ 2.4 Handle errors centrally, not within a middleware

TL;DR: Error handling logic such as mail to admin and logging should be encapsulated in a dedicated and centralized object that all endpoints (e.g. Express middleware, cron jobs, unit-testing) call when an error comes in

Otherwise: Not handling errors within a single place will lead to code duplication and probably to improperly handled errors

link Read More: handling errors in a centralized place

 

✔ 2.5 Document API errors using Swagger or GraphQL

TL;DR: Let your API callers know which errors might come in return so they can handle these thoughtfully without crashing. For RESTful APIs, this is usually done with documentation frameworks like Swagger. If you’re using GraphQL, you can utilize your schema and comments as well.

Otherwise: An API client might decide to crash and restart only because it received back an error it couldn’t understand. Note: the caller of your API might be you (very typical in a microservice environment)

link Read More: documenting API errors in Swagger or GraphQL

 

✔ 2.6 Exit the process gracefully when a stranger comes to town

TL;DR: When an unknown error occurs (a developer error, see best practice 2.3) – there is uncertainty about the application healthiness. Common practice suggests restarting the process carefully using a process management tool like Forever or PM2

Otherwise: When an unfamiliar exception occurs, some object might be in a faulty state (e.g. an event emitter which is used globally and not firing events anymore due to some internal failure) and all future requests might fail or behave crazily

link Read More: shutting the process

 

✔ 2.7 Use a mature logger to increase error visibility

TL;DR: A set of mature logging tools like Pino or Log4js, will speed-up error discovery and understanding. So forget about console.log

Otherwise: Skimming through console.logs or manually through messy text file without querying tools or a decent log viewer might keep you busy at work until late

link Read More: using a mature logger

 

✔ 2.8 Test error flows using your favorite test framework

TL;DR: Whether professional automated QA or plain manual developer testing – Ensure that your code not only satisfies positive scenarios but also handles and returns the right errors. Testing frameworks like Mocha & Chai can handle this easily (see code examples within the “Gist popup”)

Otherwise: Without testing, whether automatically or manually, you can’t rely on your code to return the right errors. Without meaningful errors – there’s no error handling

link Read More: testing error flows

 

✔ 2.9 Discover errors and downtime using APM products

TL;DR: Monitoring and performance products (a.k.a APM) proactively gauge your codebase or API so they can automagically highlight errors, crashes, and slow parts that you were missing

Otherwise: You might spend great effort on measuring API performance and downtimes, probably you’ll never be aware which are your slowest code parts under real-world scenario and how these affect the UX

link Read More: using APM products

 

✔ 2.10 Catch unhandled promise rejections

TL;DR: Any exception thrown within a promise will get swallowed and discarded unless a developer didn’t forget to explicitly handle it. Even if your code is subscribed to process.uncaughtException! Overcome this by registering to the event process.unhandledRejection

Otherwise: Your errors will get swallowed and leave no trace. Nothing to worry about

link Read More: catching unhandled promise rejection

 

✔ 2.11 Fail fast, validate arguments using a dedicated library

TL;DR: Assert API input to avoid nasty bugs that are much harder to track later. The validation code is usually tedious unless you are using a very cool helper library like ajv and Joi

Otherwise: Consider this – your function expects a numeric argument “Discount” which the caller forgets to pass, later on, your code checks if Discount!=0 (amount of allowed discount is greater than zero), then it will allow the user to enjoy a discount. OMG, what a nasty bug. Can you see it?

link Read More: failing fast

 

✔ 2.12 Always await promises before returning to avoid a partial stacktrace

TL;DR: Always do return await when returning a promise to benefit full error stacktrace. If a function returns a promise, that function must be declared as async function and explicitly await the promise before returning it

Otherwise: The function that returns a promise without awaiting won’t appear in the stacktrace. Such missing frames would probably complicate the understanding of the flow that leads to the error, especially if the cause of the abnormal behavior is inside of the missing function

link Read More: returning promises

 

arrow_up Return to top

3. Code Style Practices

✔ 3.1 Use ESLint

TL;DR: ESLint is the de-facto standard for checking possible code errors and fixing code style, not only to identify nitty-gritty spacing issues but also to detect serious code anti-patterns like developers throwing errors without classification. Though ESLint can automatically fix code styles, other tools like prettier and beautify are more powerful in formatting the fix and work in conjunction with ESLint

Otherwise: Developers will focus on tedious spacing and line-width concerns and time might be wasted overthinking the project’s code style

link Read More: Using ESLint and Prettier

 

✔ 3.2 Node.js specific plugins

TL;DR: On top of ESLint standard rules that cover vanilla JavaScript, add Node.js specific plugins like eslint-plugin-nodeeslint-plugin-mocha and eslint-plugin-node-security

Otherwise: Many faulty Node.js code patterns might escape under the radar. For example, developers might require(variableAsPath) files with a variable given as a path which allows attackers to execute any JS script. Node.js linters can detect such patterns and complain early

 

✔ 3.3 Start a Codeblock’s Curly Braces on the Same Line

TL;DR: The opening curly braces of a code block should be on the same line as the opening statement

Code Example

// Do
function someFunction() {
  // code block
}

// Avoid
function someFunction()
{
  // code block
}

Otherwise: Deferring from this best practice might lead to unexpected results, as seen in the StackOverflow thread below:

link Read more: “Why do results vary based on curly brace placement?” (StackOverflow)

 

✔ 3.4 Separate your statements properly

No matter if you use semicolons or not to separate your statements, knowing the common pitfalls of improper linebreaks or automatic semicolon insertion, will help you to eliminate regular syntax errors.

TL;DR: Use ESLint to gain awareness about separation concerns. Prettier or Standardjs can automatically resolve these issues.

Otherwise: As seen in the previous section, JavaScript’s interpreter automatically adds a semicolon at the end of a statement if there isn’t one, or considers a statement as not ended where it should, which might lead to some undesired results. You can use assignments and avoid using immediately invoked function expressions to prevent most of the unexpected errors.

Code example

// Do
function doThing() {
    // ...
}

doThing()

// Do

const items = [1, 2, 3]
items.forEach(console.log)

// Avoid — throws exception
const m = new Map()
const a = [1,2,3]
[...m.values()].forEach(console.log)
> [...m.values()].forEach(console.log)
>  ^^^
> SyntaxError: Unexpected token ...

// Avoid — throws exception
const count = 2 // it tries to run 2(), but 2 is not a function
(function doSomething() {
  // do something amazing
}())
// put a semicolon before the immediate invoked function, after the const definition, save the return value of the anonymous function to a variable or avoid IIFEs altogether

link Read more: “Semi ESLint rule” link Read more: “No unexpected multiline ESLint rule”

 

✔ 3.5 Name your functions

TL;DR: Name all functions, including closures and callbacks. Avoid anonymous functions. This is especially useful when profiling a node app. Naming all functions will allow you to easily understand what you’re looking at when checking a memory snapshot

Otherwise: Debugging production issues using a core dump (memory snapshot) might become challenging as you notice significant memory consumption from anonymous functions

 

✔ 3.6 Use naming conventions for variables, constants, functions and classes

TL;DR: Use lowerCamelCase when naming constants, variables and functions and UpperCamelCase (capital first letter as well) when naming classes. This will help you to easily distinguish between plain variables/functions, and classes that require instantiation. Use descriptive names, but try to keep them short

Otherwise: JavaScript is the only language in the world that allows invoking a constructor (“Class”) directly without instantiating it first. Consequently, Classes and function-constructors are differentiated by starting with UpperCamelCase

3.6 Code Example

// for class name we use UpperCamelCase
class SomeClassExample {}

// for const names we use the const keyword and lowerCamelCase
const config = {
  key: "value",
};

// for variables and functions names we use lowerCamelCase
let someVariableExample = "value";
function doSomething() {}

 

✔ 3.7 Prefer const over let. Ditch the var

TL;DR: Using const means that once a variable is assigned, it cannot be reassigned. Preferring const will help you to not be tempted to use the same variable for different uses, and make your code clearer. If a variable needs to be reassigned, in a for loop, for example, use let to declare it. Another important aspect of let is that a variable declared using it is only available in the block scope in which it was defined. var is function scoped, not block-scoped, and shouldn’t be used in ES6 now that you have const and let at your disposal

Otherwise: Debugging becomes way more cumbersome when following a variable that frequently changes

link Read more: JavaScript ES6+: var, let, or const?

 

✔ 3.8 Require modules first, not inside functions

TL;DR: Require modules at the beginning of each file, before and outside of any functions. This simple best practice will not only help you easily and quickly tell the dependencies of a file right at the top but also avoids a couple of potential problems

Otherwise: Requires are run synchronously by Node.js. If they are called from within a function, it may block other requests from being handled at a more critical time. Also, if a required module or any of its dependencies throw an error and crash the server, it is best to find out about it as soon as possible, which might not be the case if that module is required from within a function

 

✔ 3.9 Require modules by folders, as opposed to the files directly

TL;DR: When developing a module/library in a folder, place an index.js file that exposes the module’s internals so every consumer will pass through it. This serves as an ‘interface’ to your module and eases future changes without breaking the contract

Otherwise: Changing the internal structure of files or the signature may break the interface with clients

3.9 Code example

// Do
module.exports.SMSProvider = require("./SMSProvider");
module.exports.SMSNumberResolver = require("./SMSNumberResolver");

// Avoid
module.exports.SMSProvider = require("./SMSProvider/SMSProvider.js");
module.exports.SMSNumberResolver = require("./SMSNumberResolver/SMSNumberResolver.js");

 

✔ 3.10 Use the === operator

TL;DR: Prefer the strict equality operator === over the weaker abstract equality operator ==== will compare two variables after converting them to a common type. There is no type conversion in ===, and both variables must be of the same type to be equal

Otherwise: Unequal variables might return true when compared with the == operator

3.10 Code example

"" == "0"; // false
0 == ""; // true
0 == "0"; // true

false == "false"; // false
false == "0"; // true

false == undefined; // false
false == null; // false
null == undefined; // true

" \t\r\n " == 0; // true

All statements above will return false if used with ===

 

✔ 3.11 Use Async Await, avoid callbacks

TL;DR: Node 8 LTS now has full support for Async-await. This is a new way of dealing with asynchronous code which supersedes callbacks and promises. Async-await is non-blocking, and it makes asynchronous code look synchronous. The best gift you can give to your code is using async-await which provides a much more compact and familiar code syntax like try-catch

Otherwise: Handling async errors in callback style are probably the fastest way to hell – this style forces to check errors all over, deal with awkward code nesting, and makes it difficult to reason about the code flow

linkRead more: Guide to async-await 1.0

 

✔ 3.12 Use arrow function expressions (=>)

TL;DR: Though it’s recommended to use async-await and avoid function parameters when dealing with older APIs that accept promises or callbacks – arrow functions make the code structure more compact and keep the lexical context of the root function (i.e. this)

Otherwise: Longer code (in ES5 functions) is more prone to bugs and cumbersome to read

link Read more: It’s Time to Embrace Arrow Functions

 

arrow_up Return to top

4. Testing And Overall Quality Practices

✔ 4.1 At the very least, write API (component) testing

TL;DR: Most projects just don’t have any automated testing due to short timetables or often the ‘testing project’ ran out of control and was abandoned. For that reason, prioritize and start with API testing which is the easiest way to write and provides more coverage than unit testing (you may even craft API tests without code using tools like Postman). Afterward, should you have more resources and time, continue with advanced test types like unit testing, DB testing, performance testing, etc

Otherwise: You may spend long days on writing unit tests to find out that you got only 20% system coverage

 

✔ 4.2 Include 3 parts in each test name

TL;DR: Make the test speak at the requirements level so it’s self-explanatory also to QA engineers and developers who are not familiar with the code internals. State in the test name what is being tested (unit under test), under what circumstances, and what is the expected result

Otherwise: A deployment just failed, a test named “Add product” failed. Does this tell you what exactly is malfunctioning?

link Read More: Include 3 parts in each test name

 

✔ 4.3 Structure tests by the AAA pattern

TL;DR: Structure your tests with 3 well-separated sections: Arrange, Act & Assert (AAA). The first part includes the test setup, then the execution of the unit under test, and finally the assertion phase. Following this structure guarantees that the reader spends no brain CPU on understanding the test plan

Otherwise: Not only you spend long daily hours on understanding the main code, but now also what should have been the simple part of the day (testing) stretches your brain

link Read More: Structure tests by the AAA pattern

 

✔ 4.4 Detect code issues with a linter

TL;DR: Use a code linter to check the basic quality and detect anti-patterns early. Run it before any test and add it as a pre-commit git-hook to minimize the time needed to review and correct any issue. Also check Section 3 on Code Style Practices

Otherwise: You may let pass some anti-pattern and possible vulnerable code to your production environment.

 

✔ 4.5 Avoid global test fixtures and seeds, add data per-test

TL;DR: To prevent test coupling and easily reason about the test flow, each test should add and act on its own set of DB rows. Whenever a test needs to pull or assume the existence of some DB data – it must explicitly add that data and avoid mutating any other records

Otherwise: Consider a scenario where deployment is aborted due to failing tests, team is now going to spend precious investigation time that ends in a sad conclusion: the system works well, the tests however interfere with each other and break the build

link Read More: Avoid global test fixtures

 

✔ 4.6 Constantly inspect for vulnerable dependencies

TL;DR: Even the most reputable dependencies such as Express have known vulnerabilities. This can get easily tamed using community and commercial tools such as link npm audit and link snyk.io that can be invoked from your CI on every build

Otherwise: Keeping your code clean from vulnerabilities without dedicated tools will require to constantly follow online publications about new threats. Quite tedious

 

✔ 4.7 Tag your tests

TL;DR: Different tests must run on different scenarios: quick smoke, IO-less, tests should run when a developer saves or commits a file, full end-to-end tests usually run when a new pull request is submitted, etc. This can be achieved by tagging tests with keywords like #cold #api #sanity so you can grep with your testing harness and invoke the desired subset. For example, this is how you would invoke only the sanity test group with Mocha: mocha –grep ‘sanity’

Otherwise: Running all the tests, including tests that perform dozens of DB queries, any time a developer makes a small change can be extremely slow and keeps developers away from running tests

 

✔ 4.8 Check your test coverage, it helps to identify wrong test patterns

TL;DR: Code coverage tools like Istanbul/NYC are great for 3 reasons: it comes for free (no effort is required to benefit this reports), it helps to identify a decrease in testing coverage, and last but not least it highlights testing mismatches: by looking at colored code coverage reports you may notice, for example, code areas that are never tested like catch clauses (meaning that tests only invoke the happy paths and not how the app behaves on errors). Set it to fail builds if the coverage falls under a certain threshold

Otherwise: There won’t be any automated metric telling you when a large portion of your code is not covered by testing

 

✔ 4.9 Inspect for outdated packages

TL;DR: Use your preferred tool (e.g. npm outdated or npm-check-updates) to detect installed outdated packages, inject this check into your CI pipeline and even make a build fail in a severe scenario. For example, a severe scenario might be when an installed package is 5 patch commits behind (e.g. local version is 1.3.1 and repository version is 1.3.8) or it is tagged as deprecated by its author – kill the build and prevent deploying this version

Otherwise: Your production will run packages that have been explicitly tagged by their author as risky

 

✔ 4.10 Use production-like environment for e2e testing

TL;DR: End to end (e2e) testing which includes live data used to be the weakest link of the CI process as it depends on multiple heavy services like DB. Use an environment which is as close to your real production environment as possible like a-continue (Missed -continue here, needs content. Judging by the Otherwise clause, this should mention docker-compose)

Otherwise: Without docker-compose, teams must maintain a testing DB for each testing environment including developers’ machines, keep all those DBs in sync so test results won’t vary across environments

 

✔ 4.11 Refactor regularly using static analysis tools

TL;DR: Using static analysis tools helps by giving objective ways to improve code quality and keeps your code maintainable. You can add static analysis tools to your CI build to fail when it finds code smells. Its main selling points over plain linting are the ability to inspect quality in the context of multiple files (e.g. detect duplications), perform advanced analysis (e.g. code complexity), and follow the history and progress of code issues. Two examples of tools you can use are Sonarqube (2,600+ stars) and Code Climate (1,500+ stars).

Otherwise: With poor code quality, bugs and performance will always be an issue that no shiny new library or state of the art features can fix

link Read More: Refactoring!

 

✔ 4.12 Carefully choose your CI platform (Jenkins vs CircleCI vs Travis vs Rest of the world)

TL;DR: Your continuous integration platform (CICD) will host all the quality tools (e.g. test, lint) so it should come with a vibrant ecosystem of plugins. Jenkins used to be the default for many projects as it has the biggest community along with a very powerful platform at the price of a complex setup that demands a steep learning curve. Nowadays, it has become much easier to set up a CI solution using SaaS tools like CircleCI and others. These tools allow crafting a flexible CI pipeline without the burden of managing the whole infrastructure. Eventually, it’s a trade-off between robustness and speed – choose your side carefully

Otherwise: Choosing some niche vendor might get you blocked once you need some advanced customization. On the other hand, going with Jenkins might burn precious time on infrastructure setup

link Read More: Choosing CI platform

✔ 4.13 Test your middlewares in isolation

TL;DR: When a middleware holds some immense logic that spans many requests, it is worth testing it in isolation without waking up the entire web framework. This can be easily achieved by stubbing and spying on the {req, res, next} objects

Otherwise: A bug in Express middleware === a bug in all or most requests

link Read More: Test middlewares in isolation

 

arrow_up Return to top

5. Going To Production Practices

✔ 5.1. Monitoring

TL;DR: Monitoring is a game of finding out issues before customers do – obviously this should be assigned unprecedented importance. The market is overwhelmed with offers thus consider starting with defining the basic metrics you must follow (my suggestions inside), then go over additional fancy features and choose the solution that ticks all boxes. Click ‘The Gist’ below for an overview of the solutions

Otherwise: Failure === disappointed customers. Simple

link Read More: Monitoring!

 

✔ 5.2. Increase transparency using smart logging

TL;DR: Logs can be a dumb warehouse of debug statements or the enabler of a beautiful dashboard that tells the story of your app. Plan your logging platform from day 1: how logs are collected, stored and analyzed to ensure that the desired information (e.g. error rate, following an entire transaction through services and servers, etc) can really be extracted

Otherwise: You end up with a black box that is hard to reason about, then you start re-writing all logging statements to add additional information

link Read More: Increase transparency using smart logging

 

✔ 5.3. Delegate anything possible (e.g. gzip, SSL) to a reverse proxy

TL;DR: Node is awfully bad at doing CPU intensive tasks like gzipping, SSL termination, etc. You should use ‘real’ middleware services like nginx, HAproxy or cloud vendor services instead

Otherwise: Your poor single thread will stay busy doing infrastructural tasks instead of dealing with your application core and performance will degrade accordingly

link Read More: Delegate anything possible (e.g. gzip, SSL) to a reverse proxy

 

✔ 5.4. Lock dependencies

TL;DR: Your code must be identical across all environments, but amazingly npm lets dependencies drift across environments by default – when you install packages at various environments it tries to fetch packages’ latest patch version. Overcome this by using npm config files, .npmrc, that tell each environment to save the exact (not the latest) version of each package. Alternatively, for finer grained control use npm shrinkwrap. *Update: as of NPM5, dependencies are locked by default. The new package manager in town, Yarn, also got us covered by default

Otherwise: QA will thoroughly test the code and approve a version that will behave differently in production. Even worse, different servers in the same production cluster might run different code

link Read More: Lock dependencies

 

✔ 5.5. Guard process uptime using the right tool

TL;DR: The process must go on and get restarted upon failures. For simple scenarios, process management tools like PM2 might be enough but in today’s ‘dockerized’ world, cluster management tools should be considered as well

Otherwise: Running dozens of instances without a clear strategy and too many tools together (cluster management, docker, PM2) might lead to DevOps chaos

link Read More: Guard process uptime using the right tool

 

✔ 5.6. Utilize all CPU cores

TL;DR: At its basic form, a Node app runs on a single CPU core while all others are left idling. It’s your duty to replicate the Node process and utilize all CPUs – For small-medium apps you may use Node Cluster or PM2. For a larger app consider replicating the process using some Docker cluster (e.g. K8S, ECS) or deployment scripts that are based on Linux init system (e.g. systemd)

Otherwise: Your app will likely utilize only 25% of its available resources(!) or even less. Note that a typical server has 4 CPU cores or more, naive deployment of Node.js utilizes only 1 (even using PaaS services like AWS beanstalk!)

link Read More: Utilize all CPU cores

 

✔ 5.7. Create a ‘maintenance endpoint’

TL;DR: Expose a set of system-related information, like memory usage and REPL, etc in a secured API. Although it’s highly recommended to rely on standard and battle-tested tools, some valuable information and operations are easier done using code

Otherwise: You’ll find that you’re performing many “diagnostic deploys” – shipping code to production only to extract some information for diagnostic purposes

link Read More: Create a ‘maintenance endpoint’

 

✔ 5.8. Discover errors and downtime using APM products

TL;DR: Application monitoring and performance products (a.k.a. APM) proactively gauge codebase and API so they can auto-magically go beyond traditional monitoring and measure the overall user-experience across services and tiers. For example, some APM products can highlight a transaction that loads too slow on the end-user’s side while suggesting the root cause

Otherwise: You might spend great effort on measuring API performance and downtimes, probably you’ll never be aware which is your slowest code parts under real-world scenario and how these affect the UX

link Read More: Discover errors and downtime using APM products

 

✔ 5.9. Make your code production-ready

TL;DR: Code with the end in mind, plan for production from day 1. This sounds a bit vague so I’ve compiled a few development tips that are closely related to production maintenance (click Gist below)

Otherwise: A world champion IT/DevOps guy won’t save a system that is badly written

link Read More: Make your code production-ready

 

✔ 5.10. Measure and guard the memory usage

TL;DR: Node.js has controversial relationships with memory: the v8 engine has soft limits on memory usage (1.4GB) and there are known paths to leak memory in Node’s code – thus watching Node’s process memory is a must. In small apps, you may gauge memory periodically using shell commands but in medium-large apps consider baking your memory watch into a robust monitoring system

Otherwise: Your process memory might leak a hundred megabytes a day like how it happened at Walmart

link Read More: Measure and guard the memory usage

 

✔ 5.11. Get your frontend assets out of Node

TL;DR: Serve frontend content using dedicated middleware (nginx, S3, CDN) because Node performance really gets hurt when dealing with many static files due to its single-threaded model

Otherwise: Your single Node thread will be busy streaming hundreds of html/images/angular/react files instead of allocating all its resources for the task it was born for – serving dynamic content

link Read More: Get your frontend assets out of Node

 

✔ 5.12. Be stateless, kill your servers almost every day

TL;DR: Store any type of data (e.g. user sessions, cache, uploaded files) within external data stores. Consider ‘killing’ your servers periodically or use ‘serverless’ platform (e.g. AWS Lambda) that explicitly enforces a stateless behavior

Otherwise: Failure at a given server will result in application downtime instead of just killing a faulty machine. Moreover, scaling-out elasticity will get more challenging due to the reliance on a specific server

link Read More: Be stateless, kill your Servers almost every day

 

✔ 5.13. Use tools that automatically detect vulnerabilities

TL;DR: Even the most reputable dependencies such as Express have known vulnerabilities (from time to time) that can put a system at risk. This can be easily tamed using community and commercial tools that constantly check for vulnerabilities and warn (locally or at GitHub), some can even patch them immediately

Otherwise: Keeping your code clean from vulnerabilities without dedicated tools will require you to constantly follow online publications about new threats. Quite tedious

link Read More: Use tools that automatically detect vulnerabilities

 

✔ 5.14. Assign a transaction id to each log statement

Also known as correlation id / transit id / tracing id / request id / request context / etc.

TL;DR: Assign the same identifier, transaction-id: {some value}, to each log entry within a single request. Then when inspecting errors in logs, easily conclude what happened before and after. Until version 14 of Node, this was not easy to achieve due to Node’s async nature, but since AsyncLocalStorage came to town, this became possible and easy than ever. see code examples inside

Otherwise: Looking at a production error log without the context – what happened before – makes it much harder and slower to reason about the issue

link Read More: Assign ‘TransactionId’ to each log statement

 

✔ 5.15. Set NODE_ENV=production

TL;DR: Set the environment variable NODE_ENV to ‘production’ or ‘development’ to flag whether production optimizations should get activated – many npm packages determine the current environment and optimize their code for production

Otherwise: Omitting this simple property might greatly degrade performance. For example, when using Express for server-side rendering omitting NODE_ENV makes it slower by a factor of three!

link Read More: Set NODE_ENV=production

 

✔ 5.16. Design automated, atomic and zero-downtime deployments

TL;DR: Research shows that teams who perform many deployments lower the probability of severe production issues. Fast and automated deployments that don’t require risky manual steps and service downtime significantly improve the deployment process. You should probably achieve this using Docker combined with CI tools as they became the industry standard for streamlined deployment

Otherwise: Long deployments -> production downtime & human-related error -> team unconfident in making deployment -> fewer deployments and features

 

✔ 5.17. Use an LTS release of Node.js

TL;DR: Ensure you are using an LTS version of Node.js to receive critical bug fixes, security updates and performance improvements

Otherwise: Newly discovered bugs or vulnerabilities could be used to exploit an application running in production, and your application may become unsupported by various modules and harder to maintain

link Read More: Use an LTS release of Node.js

 

✔ 5.18. Don’t route logs within the app

TL;DR: Log destinations should not be hard-coded by developers within the application code, but instead should be defined by the execution environment the application runs in. Developers should write logs to stdout using a logger utility and then let the execution environment (container, server, etc.) pipe the stdout stream to the appropriate destination (i.e. Splunk, Graylog, ElasticSearch, etc.).

Otherwise: Application handling log routing === hard to scale, loss of logs, poor separation of concerns

link Read More: Log Routing

 

✔ 5.19. Install your packages with npm ci

TL;DR: You have to be sure that production code uses the exact version of the packages you have tested it with. Run npm ci to strictly do a clean install of your dependencies matching package.json and package-lock.json. Using this command is recommended in automated environments such as continuous integration pipelines.

Otherwise: QA will thoroughly test the code and approve a version that will behave differently in production. Even worse, different servers in the same production cluster might run different code.

link Read More: Use npm ci

 

arrow_up Return to top

6. Security Best Practices

54 items

✔ 6.1. Embrace linter security rules

 

TL;DR: Make use of security-related linter plugins such as eslint-plugin-security to catch security vulnerabilities and issues as early as possible, preferably while they’re being coded. This can help catching security weaknesses like using eval, invoking a child process or importing a module with a string literal (e.g. user input). Click ‘Read more’ below to see code examples that will get caught by a security linter

Otherwise: What could have been a straightforward security weakness during development becomes a major issue in production. Also, the project may not follow consistent code security practices, leading to vulnerabilities being introduced, or sensitive secrets committed into remote repositories

link Read More: Lint rules

 

✔ 6.2. Limit concurrent requests using a middleware

TL;DR: DOS attacks are very popular and relatively easy to conduct. Implement rate limiting using an external service such as cloud load balancers, cloud firewalls, nginx, rate-limiter-flexible package, or (for smaller and less critical apps) a rate-limiting middleware (e.g. express-rate-limit)

Otherwise: An application could be subject to an attack resulting in a denial of service where real users receive a degraded or unavailable service.

link Read More: Implement rate limiting

 

✔ 6.3 Extract secrets from config files or use packages to encrypt them

 

TL;DR: Never store plain-text secrets in configuration files or source code. Instead, make use of secret-management systems like Vault products, Kubernetes/Docker Secrets, or using environment variables. As a last resort, secrets stored in source control must be encrypted and managed (rolling keys, expiring, auditing, etc). Make use of pre-commit/push hooks to prevent committing secrets accidentally

Otherwise: Source control, even for private repositories, can mistakenly be made public, at which point all secrets are exposed. Access to source control for an external party will inadvertently provide access to related systems (databases, apis, services, etc).

link Read More: Secret management

 

✔ 6.4. Prevent query injection vulnerabilities with ORM/ODM libraries

TL;DR: To prevent SQL/NoSQL injection and other malicious attacks, always make use of an ORM/ODM or a database library that escapes data or supports named or indexed parameterized queries, and takes care of validating user input for expected types. Never just use JavaScript template strings or string concatenation to inject values into queries as this opens your application to a wide spectrum of vulnerabilities. All the reputable Node.js data access libraries (e.g. SequelizeKnexmongoose) have built-in protection against injection attacks.

Otherwise: Unvalidated or unsanitized user input could lead to operator injection when working with MongoDB for NoSQL, and not using a proper sanitization system or ORM will easily allow SQL injection attacks, creating a giant vulnerability.

link Read More: Query injection prevention using ORM/ODM libraries

 

✔ 6.5. Collection of generic security best practices

TL;DR: This is a collection of security advice that is not related directly to Node.js – the Node implementation is not much different than any other language. Click read more to skim through.

link Read More: Common security best practices

 

✔ 6.6. Adjust the HTTP response headers for enhanced security

TL;DR: Your application should be using secure headers to prevent attackers from using common attacks like cross-site scripting (XSS), clickjacking and other malicious attacks. These can be configured easily using modules like helmet.

Otherwise: Attackers could perform direct attacks on your application’s users, leading to huge security vulnerabilities

link Read More: Using secure headers in your application

 

✔ 6.7. Constantly and automatically inspect for vulnerable dependencies

TL;DR: With the npm ecosystem it is common to have many dependencies for a project. Dependencies should always be kept in check as new vulnerabilities are found. Use tools like npm audit or snyk to track, monitor and patch vulnerable dependencies. Integrate these tools with your CI setup so you catch a vulnerable dependency before it makes it to production.

Otherwise: An attacker could detect your web framework and attack all its known vulnerabilities.

link Read More: Dependency security

 

✔ 6.8. Protect Users’ Passwords/Secrets using bcrypt or scrypt

TL;DR: Passwords or secrets (e.g. API keys) should be stored using a secure hash + salt function like bcrypt,scrypt, or worst case pbkdf2.

Otherwise: Passwords and secrets that are stored without using a secure function are vulnerable to brute forcing and dictionary attacks that will lead to their disclosure eventually.

link Read More: User Passwords

 

✔ 6.9. Escape HTML, JS and CSS output

TL;DR: Untrusted data that is sent down to the browser might get executed instead of just being displayed, this is commonly referred as a cross-site-scripting (XSS) attack. Mitigate this by using dedicated libraries that explicitly mark the data as pure content that should never get executed (i.e. encoding, escaping)

Otherwise: An attacker might store malicious JavaScript code in your DB which will then be sent as-is to the poor clients

link Read More: Escape output

 

✔ 6.10. Validate incoming JSON schemas

 

TL;DR: Validate the incoming requests’ body payload and ensure it meets expectations, fail fast if it doesn’t. To avoid tedious validation coding within each route you may use lightweight JSON-based validation schemas such as jsonschema or joi

Otherwise: Your generosity and permissive approach greatly increases the attack surface and encourages the attacker to try out many inputs until they find some combination to crash the application

link Read More: Validate incoming JSON schemas

 

✔ 6.11. Support blocklisting JWTs

TL;DR: When using JSON Web Tokens (for example, with Passport.js), by default there’s no mechanism to revoke access from issued tokens. Once you discover some malicious user activity, there’s no way to stop them from accessing the system as long as they hold a valid token. Mitigate this by implementing a blocklist of untrusted tokens that are validated on each request.

Otherwise: Expired, or misplaced tokens could be used maliciously by a third party to access an application and impersonate the owner of the token.

link Read More: Blocklist JSON Web Tokens

 

✔ 6.12. Prevent brute-force attacks against authorization

TL;DR: A simple and powerful technique is to limit authorization attempts using two metrics:

  1. The first is number of consecutive failed attempts by the same user unique ID/name and IP address.
  2. The second is number of failed attempts from an IP address over some long period of time. For example, block an IP address if it makes 100 failed attempts in one day.

Otherwise: An attacker can issue unlimited automated password attempts to gain access to privileged accounts on an application

link Read More: Login rate limiting

 

✔ 6.13. Run Node.js as non-root user

TL;DR: There is a common scenario where Node.js runs as a root user with unlimited permissions. For example, this is the default behaviour in Docker containers. It’s recommended to create a non-root user and either bake it into the Docker image (examples given below) or run the process on this user’s behalf by invoking the container with the flag “-u username”

Otherwise: An attacker who manages to run a script on the server gets unlimited power over the local machine (e.g. change iptable and re-route traffic to his server)

link Read More: Run Node.js as non-root user

 

✔ 6.14. Limit payload size using a reverse-proxy or a middleware

 

TL;DR: The bigger the body payload is, the harder your single thread works in processing it. This is an opportunity for attackers to bring servers to their knees without tremendous amount of requests (DOS/DDOS attacks). Mitigate this limiting the body size of incoming requests on the edge (e.g. firewall, ELB) or by configuring express body parser to accept only small-size payloads

Otherwise: Your application will have to deal with large requests, unable to process the other important work it has to accomplish, leading to performance implications and vulnerability towards DOS attacks

link Read More: Limit payload size

 

✔ 6.15. Avoid JavaScript eval statements

  

TL;DR: eval is evil as it allows executing custom JavaScript code during run time. This is not just a performance concern but also an important security concern due to malicious JavaScript code that may be sourced from user input. Another language feature that should be avoided is new Function constructor. setTimeout and setInterval should never be passed dynamic JavaScript code either.

Otherwise: Malicious JavaScript code finds a way into text passed into eval or other real-time evaluating JavaScript language functions, and will gain complete access to JavaScript permissions on the page. This vulnerability is often manifested as an XSS attack.

link Read More: Avoid JavaScript eval statements

 

✔ 6.16. Prevent evil RegEx from overloading your single thread execution

TL;DR: Regular Expressions, while being handy, pose a real threat to JavaScript applications at large, and the Node.js platform in particular. A user input for text to match might require an outstanding amount of CPU cycles to process. RegEx processing might be inefficient to an extent that a single request that validates 10 words can block the entire event loop for 6 seconds and set the CPU on fire. For that reason, prefer third-party validation packages like validator.js instead of writing your own Regex patterns, or make use of safe-regex to detect vulnerable regex patterns

Otherwise: Poorly written regexes could be susceptible to Regular Expression DoS attacks that will block the event loop completely. For example, the popular moment package was found vulnerable with malicious RegEx usage in November of 2017

link Read More: Prevent malicious RegEx

 

✔ 6.17. Avoid module loading using a variable

  

TL;DR: Avoid requiring/importing another file with a path that was given as parameter due to the concern that it could have originated from user input. This rule can be extended for accessing files in general (i.e. fs.readFile()) or other sensitive resource access with dynamic variables originating from user input. Eslint-plugin-security linter can catch such patterns and warn early enough

Otherwise: Malicious user input could find its way to a parameter that is used to require tampered files, for example, a previously uploaded file on the file system, or access already existing system files.

link Read More: Safe module loading

 

✔ 6.18. Run unsafe code in a sandbox

  

TL;DR: When tasked to run external code that is given at run-time (e.g. plugin), use any sort of ‘sandbox’ execution environment that isolates and guards the main code against the plugin. This can be achieved using a dedicated process (e.g. cluster.fork()), serverless environment or dedicated npm packages that act as a sandbox

Otherwise: A plugin can attack through an endless variety of options like infinite loops, memory overloading, and access to sensitive process environment variables

link Read More: Run unsafe code in a sandbox

 

✔ 6.19. Take extra care when working with child processes

  

TL;DR: Avoid using child processes when possible and validate and sanitize input to mitigate shell injection attacks if you still have to. Prefer using child_process.execFile which by definition will only execute a single command with a set of attributes and will not allow shell parameter expansion.

Otherwise: Naive use of child processes could result in remote command execution or shell injection attacks due to malicious user input passed to an unsanitized system command.

link Read More: Be cautious when working with child processes

 

✔ 6.20. Hide error details from clients

TL;DR: An integrated express error handler hides the error details by default. However, great are the chances that you implement your own error handling logic with custom Error objects (considered by many as a best practice). If you do so, ensure not to return the entire Error object to the client, which might contain some sensitive application details

Otherwise: Sensitive application details such as server file paths, third party modules in use, and other internal workflows of the application which could be exploited by an attacker, could be leaked from information found in a stack trace

link Read More: Hide error details from client

 

✔ 6.21. Configure 2FA for npm or Yarn

TL;DR: Any step in the development chain should be protected with MFA (multi-factor authentication), npm/Yarn are a sweet opportunity for attackers who can get their hands on some developer’s password. Using developer credentials, attackers can inject malicious code into libraries that are widely installed across projects and services. Maybe even across the web if published in public. Enabling 2-factor-authentication in npm leaves almost zero chances for attackers to alter your package code.

Otherwise: Have you heard about the eslint developer whose password was hijacked?

 

✔ 6.22. Modify session middleware settings

TL;DR: Each web framework and technology has its known weaknesses - telling an attacker which web framework we use is a great help for them. Using the default settings for session middlewares can expose your app to module- and framework-specific hijacking attacks in a similar way to the X-Powered-By header. Try hiding anything that identifies and reveals your tech stack (E.g. Node.js, express)

Otherwise: Cookies could be sent over insecure connections, and an attacker might use session identification to identify the underlying framework of the web application, as well as module-specific vulnerabilities

link Read More: Cookie and session security

 

✔ 6.23. Avoid DOS attacks by explicitly setting when a process should crash

TL;DR: The Node process will crash when errors are not handled. Many best practices even recommend to exit even though an error was caught and got handled. Express, for example, will crash on any asynchronous error - unless you wrap routes with a catch clause. This opens a very sweet attack spot for attackers who recognize what input makes the process crash and repeatedly send the same request. There’s no instant remedy for this but a few techniques can mitigate the pain: Alert with critical severity anytime a process crashes due to an unhandled error, validate the input and avoid crashing the process due to invalid user input, wrap all routes with a catch and consider not to crash when an error originated within a request (as opposed to what happens globally)

Otherwise: This is just an educated guess: given many Node.js applications, if we try passing an empty JSON body to all POST requests - a handful of applications will crash. At that point, we can just repeat sending the same request to take down the applications with ease

 

✔ 6.24. Prevent unsafe redirects

TL;DR: Redirects that do not validate user input can enable attackers to launch phishing scams, steal user credentials, and perform other malicious actions.

Otherwise: If an attacker discovers that you are not validating external, user-supplied input, they may exploit this vulnerability by posting specially-crafted links on forums, social media, and other public places to get users to click it.

link Read More: Prevent unsafe redirects

 

✔ 6.25. Avoid publishing secrets to the npm registry

TL;DR: Precautions should be taken to avoid the risk of accidentally publishing secrets to public npm registries. An .npmignore file can be used to ignore specific files or folders, or the files array in package.json can act as an allow list.

Otherwise: Your project’s API keys, passwords or other secrets are open to be abused by anyone who comes across them, which may result in financial loss, impersonation, and other risks.

link Read More: Avoid publishing secrets

arrow_up Return to top

7. Draft: Performance Best Practices

Our contributors are working on this section. Would you like to join?

 

✔ 7.1. Don’t block the event loop

TL;DR: Avoid CPU intensive tasks as they will block the mostly single-threaded Event Loop and offload those to a dedicated thread, process or even a different technology based on the context.

Otherwise: As the Event Loop is blocked, Node.js will be unable to handle other request thus causing delays for concurrent users. 3000 users are waiting for a response, the content is ready to be served, but one single request blocks the server from dispatching the results back

link Read More: Do not block the event loop

 

✔ 7.2. Prefer native JS methods over user-land utils like Lodash

TL;DR: It’s often more penalising to use utility libraries like lodash and underscore over native methods as it leads to unneeded dependencies and slower performance. Bear in mind that with the introduction of the new V8 engine alongside the new ES standards, native methods were improved in such a way that it’s now about 50% more performant than utility libraries.

Otherwise: You’ll have to maintain less performant projects where you could have simply used what was already available or dealt with a few more lines in exchange of a few more files.

link Read More: Native over user land utils

 

arrow_up Return to top

8. Docker Best Practices

medal_sports Many thanks to Bret Fisher from whom we learned many of the following practices

 

✔ 8.1 Use multi-stage builds for leaner and more secure Docker images

TL;DR: Use multi-stage build to copy only necessary production artifacts. A lot of build-time dependencies and files are not needed for running your application. With multi-stage builds these resources can be used during build while the runtime environment contains only what’s necessary. Multi-stage builds are an easy way to get rid of overweight and security threats.

Otherwise: Larger images will take longer to build and ship, build-only tools might contain vulnerabilities and secrets only meant for the build phase might be leaked.

Example Dockerfile for multi-stage builds

FROM node:14.4.0 AS build

COPY . .
RUN npm ci && npm run build


FROM node:slim-14.4.0

USER node
EXPOSE 8080

COPY --from=build /home/node/app/dist /home/node/app/package.json /home/node/app/package-lock.json ./
RUN npm ci --production

CMD [ "node", "dist/app.js" ]

link Read More: Use multi-stage builds

 

✔ 8.2. Bootstrap using node command, avoid npm start

TL;DR: use CMD ['node','server.js'] to start your app, avoid using npm scripts which don’t pass OS signals to the code. This prevents problems with child-processes, signal handling, graceful shutdown and having zombie processes.

Otherwise: When no signals are passed, your code will never be notified about shutdowns. Without this, it will lose its chance to close properly possibly losing current requests and/or data.

Read More: Bootstrap container using node command, avoid npm start

 

✔ 8.3. Let the Docker runtime handle replication and uptime

TL;DR: When using a Docker run time orchestrator (e.g., Kubernetes), invoke the Node.js process directly without intermediate process managers or custom code that replicate the process (e.g. PM2, Cluster module). The runtime platform has the highest amount of data and visibility for making placement decision – It knows best how many processes are needed, how to spread them and what to do in case of crashes

Otherwise: Container keeps crashing due to lack of resources will get restarted indefinitely by the process manager. Should Kubernetes be aware of that, it could relocate it to a different roomy instance

link Read More: Let the Docker orchestrator restart and replicate processes

 

✔ 8.4. Use .dockerignore to prevent leaking secrets

TL;DR: Include a .dockerignore file that filters out common secret files and development artifacts. By doing so, you might prevent secrets from leaking into the image. As a bonus the build time will significantly decrease. Also, ensure not to copy all files recursively rather explicitly choose what should be copied to Docker

Otherwise: Common personal secret files like .env.aws and .npmrc will be shared with anybody with access to the image (e.g. Docker repository)

link Read More: Use .dockerignore

 

✔ 8.5. Clean-up dependencies before production

TL;DR: Although Dev-Dependencies are sometimes needed during the build and test life-cycle, eventually the image that is shipped to production should be minimal and clean from development dependencies. Doing so guarantees that only necessary code is shipped and the amount of potential attacks (i.e. attack surface) is minimized. When using multi-stage build (see dedicated bullet) this can be achieved by installing all dependencies first and finally running npm ci --production

Otherwise: Many of the infamous npm security breaches were found within development packages (e.g. eslint-scope)

link Read More: Remove development dependencies

 

✔ 8.6. Shutdown smartly and gracefully

TL;DR: Handle the process SIGTERM event and clean-up all existing connection and resources. This should be done while responding to ongoing requests. In Dockerized runtimes shutting down containers is not a rare event, rather a frequent occurrence that happen as part of routine work. Achieving this demands some thoughtful code to orchestrate several moving parts: The load balancer, keep-alive connections, the HTTP server and other resources

Otherwise: Dying immediately means not responding to thousands of disappointed users

link Read More: Graceful shutdown

 

✔ 8.7. Set memory limits using both Docker and v8

TL;DR: Always configure a memory limit using both Docker and the JavaScript runtime flags. The Docker limit is needed to make thoughtful container placement decision, the –v8’s flag max-old-space is needed to kick off the GC on time and prevent under utilization of memory. Practically, set the v8’s old space memory to be a just bit less than the container limit

Otherwise: The docker definition is needed to perform thoughtful scaling decision and prevent starving other citizens. Without also defining the v8’s limits, it will under utilize the container resources – Without explicit instructions it crashes when utilizing ~50-60% of its host resources

link Read More: Set memory limits using Docker only

 

✔ 8.8. Plan for efficient caching

TL;DR: Rebuilding a whole docker image from cache can be nearly instantaneous if done correctly. The less updated instructions should be at the top of your Dockerfile and the ones constantly changing (like app code) should be at the bottom.

Otherwise: Docker build will be very long and consume lot of resources even when making tiny changes

link Read More: Leverage caching to reduce build times

 

✔ 8.9. Use explicit image reference, avoid latest tag

TL;DR: Specify an explicit image digest or versioned label, never refer to latest. Developers are often led to believe that specifying the latest tag will provide them with the most recent image in the repository however this is not the case. Using a digest guarantees that every instance of the service is running exactly the same code.

In addition, referring to an image tag means that the base image is subject to change, as image tags cannot be relied upon for a deterministic install. Instead, if a deterministic install is expected, a SHA256 digest can be used to reference an exact image.

Otherwise: A new version of a base image could be deployed into production with breaking changes, causing unintended application behaviour.

link Read More: Understand image tags and use the “latest” tag with caution

 

✔ 8.10. Prefer smaller Docker base images

TL;DR: Large images lead to higher exposure to vulnerabilities and increased resource consumption. Using leaner Docker images, such as Slim and Alpine Linux variants, mitigates this issue.

Otherwise: Building, pushing, and pulling images will take longer, unknown attack vectors can be used by malicious actors and more resources are consumed.

link Read More: Prefer smaller images

 

✔ 8.11. Clean-out build-time secrets, avoid secrets in args

TL;DR: Avoid secrets leaking from the Docker build environment. A Docker image is typically shared in multiple environment like CI and a registry that are not as sanitized as production. A typical example is an npm token which is usually passed to a dockerfile as argument. This token stays within the image long after it is needed and allows the attacker indefinite access to a private npm registry. This can be avoided by coping a secret file like .npmrc and then removing it using multi-stage build (beware, build history should be deleted as well) or by using Docker build-kit secret feature which leaves zero traces

Otherwise: Everyone with access to the CI and docker registry will also get access to some precious organization secrets as a bonus

link Read More: Clean-out build-time secrets

 

✔ 8.12. Scan images for multi layers of vulnerabilities

TL;DR: Besides checking code dependencies vulnerabilities also scan the final image that is shipped to production. Docker image scanners check the code dependencies but also the OS binaries. This E2E security scan covers more ground and verifies that no bad guy injected bad things during the build. Consequently, it is recommended running this as the last step before deployment. There are a handful of free and commercial scanners that also provide CI/CD plugins

Otherwise: Your code might be entirely free from vulnerabilities. However it might still get hacked due to vulnerable version of OS-level binaries (e.g. OpenSSL, TarBall) that are commonly being used by applications

link Read More: Scan the entire image before production

 

✔ 8.13 Clean NODE_MODULE cache

TL;DR: After installing dependencies in a container remove the local cache. It doesn’t make any sense to duplicate the dependencies for faster future installs since there won’t be any further installs – A Docker image is immutable. Using a single line of code tens of MB (typically 10-50% of the image size) are shaved off

Otherwise: The image that will get shipped to production will weigh 30% more due to files that will never get used

link Read More: Clean NODE_MODULE cache

 

✔ 8.14. Generic Docker practices

TL;DR: This is a collection of Docker advice that is not related directly to Node.js – the Node implementation is not much different than any other language. Click read more to skim through.

link Read More: Generic Docker practices

 

✔ 8.15. Lint your Dockerfile

TL;DR: Linting your Dockerfile is an important step to identify issues in your Dockerfile which differ from best practices. By checking for potential flaws using a specialised Docker linter, performance and security improvements can be easily identified, saving countless hours of wasted time or security issues in production code.

Otherwise: Mistakenly the Dockerfile creator left Root as the production user, and also used an image from unknown source repository. This could be avoided with with just a simple linter.

link Read More: Lint your Dockerfile

from:https://github.com/goldbergyoni/nodebestpractices

分布式事务

[作者简介] 李文华,小米信息技术部海外商城组

随着互联网技术的不断发展,系统越来越复杂,几乎所有 IT 公司的系统都已经完成从单体架构到分布式架构的转变,分布式系统几乎无处不在。谈到分布式系统,特别是微服务架构,我们不得不谈分布式事务。今天就跟大家一起聊聊分布式事务以及常用解决方案。

基础理论

在讲解具体方案之前,我们有必要了解一些分布式事务所涉及到的基础理论知识。

事务

事务是应用程序中一系列严密的操作,所有操作必须成功完成,否则在每个操作中所作的所有更改都会被撤消。也就是事务具有原子性,一个事务中的一系列的操作要么全部成功,要么一个都不做。事务应该具有 4 个属性:原子性、一致性、隔离性、持久性。这四个属性通常称为 ACID 特性。

分布式事务

分布式事务是指事务的参与者、支持事务的服务器、资源服务器以及事务管理器分别位于不同的分布式系统的不同节点之上。例如在大型电商系统中,下单接口通常会扣减库存、减去优惠、生成订单 id, 而订单服务与库存、优惠、订单 id 都是不同的服务,下单接口的成功与否,不仅取决于本地的 db 操作,而且依赖第三方系统的结果,这时候分布式事务就保证这些操作要么全部成功,要么全部失败。本质上来说,分布式事务就是为了保证不同数据库的数据一致性。

强一致性、弱一致性、最终一致性

强一致性

任何一次读都能读到某个数据的最近一次写的数据。系统中的所有进程,看到的操作顺序,都和全局时钟下的顺序一致。简言之,在任意时刻,所有节点中的数据是一样的。

弱一致性

数据更新后,如果能容忍后续的访问只能访问到部分或者全部访问不到,则是弱一致性。

最终一致性

不保证在任意时刻任意节点上的同一份数据都是相同的,但是随着时间的迁移,不同节点上的同一份数据总是在向趋同的方向变化。简单说,就是在一段时间后,节点间的数据会最终达到一致状态。

CAP 原则

CAP 原则又称 CAP 定理,指的是在一个分布式系统中, Consistency(一致性)、 Availability(可用性)、Partition tolerance(分区容错性),三者不可得兼。

一致性(C):

在分布式系统中的所有数据备份,在同一时刻是否同样的值。(等同于所有节点访问同一份最新的数据副本)

可用性(A):

在集群中一部分节点故障后,集群整体是否还能响应客户端的读写请求。(对数据更新具备高可用性)

分区容错性(P):

以实际效果而言,分区相当于对通信的时限要求。系统如果不能在时限内达成数据一致性,就意味着发生了分区的情况,必须就当前操作在 C 和 A 之间做出选择。

CAP 原则的精髓就是要么 AP,要么 CP,要么 AC,但是不存在 CAP。如果在某个分布式系统中数据无副本, 那么系统必然满足强一致性条件, 因为只有独一数据,不会出现数据不一致的情况,此时 C 和 P 两要素具备,但是如果系统发生了网络分区状况或者宕机,必然导致某些数据不可以访问,此时可用性条件就不能被满足,即在此情况下获得了 CP 系统,但是 CAP 不可同时满足。

BASE 理论

BASE 理论指的是基本可用 Basically Available,软状态 Soft State,最终一致性 Eventual Consistency,核心思想是即便无法做到强一致性,但应该采用适合的方式保证最终一致性。

BASE,Basically Available Soft State Eventual Consistency 的简写:
BA:Basically Available 基本可用,分布式系统在出现故障的时候,允许损失部分可用性,即保证核心可用。
S:Soft State 软状态,允许系统存在中间状态,而该中间状态不会影响系统整体可用性。
E:Consistency 最终一致性,系统中的所有数据副本经过一定时间后,最终能够达到一致的状态。
BASE 理论本质上是对 CAP 理论的延伸,是对 CAP 中 AP 方案的一个补充。

柔性事务

不同于 ACID 的刚性事务,在分布式场景下基于 BASE 理论,就出现了柔性事务的概念。要想通过柔性事务来达到最终的一致性,就需要依赖于一些特性,这些特性在具体的方案中不一定都要满足,因为不同的方案要求不一样;但是都不满足的话,是不可能做柔性事务的。

幂等操作

在编程中一个幂等操作的特点是其任意多次执行所产生的影响均与一次执行的影响相同。幂等函数,或幂等方法,是指可以使用相同参数重复执行,并能获得相同结果的函数。这些函数不会影响系统状态,也不用担心重复执行会对系统造成改变。例如,支付流程中第三方支付系统告知系统中某个订单支付成功,接收该支付回调接口在网络正常的情况下无论操作多少次都应该返回成功。

分布式事务使用场景

转账

转账是最经典那的分布式事务场景,假设用户 A 使用银行 app 发起一笔跨行转账给用户 B,银行系统首先扣掉用户 A 的钱,然后增加用户 B 账户中的余额。此时就会出现 2 种异常情况:1. 用户 A 的账户扣款成功,用户 B 账户余额增加失败 2. 用户 A 账户扣款失败,用户 B 账户余额增加成功。对于银行系统来说,以上 2 种情况都是不允许发生,此时就需要分布式事务来保证转账操作的成功。

下单扣库存

在电商系统中,下单是用户最常见操作。在下单接口中必定会涉及生成订单 id, 扣减库存等操作,对于微服务架构系统,订单 id 与库存服务一般都是独立的服务,此时就需要分布式事务来保证整个下单接口的成功。

同步超时

继续以电商系统为例,在微服务体系架构下,我们的支付与订单都是作为单独的系统存在。订单的支付状态依赖支付系统的通知,假设一个场景:我们的支付系统收到来自第三方支付的通知,告知某个订单支付成功,接收通知接口需要同步调用订单服务变更订单状态接口,更新订单状态为成功。流程图如下,从图中可以看出有两次调用,第三方支付调用支付服务,以及支付服务调用订单服务,这两步调用都可能出现调用超时的情况,此处如果没有分布式事务的保证,就会出现用户订单实际支付情况与最终用户看到的订单支付情况不一致的情况。

分布式事务的解决方案

两阶段提交/XA

两阶段提交,顾名思义就是要分两步提交。存在一个负责协调各个本地资源管理器的事务管理器,本地资源管理器一般是由数据库实现,事务管理器在第一阶段的时候询问各个资源管理器是否都就绪?如果收到每个资源的回复都是 yes,则在第二阶段提交事务,如果其中任意一个资源的回复是 no, 则回滚事务。

大致的流程:

第一阶段(prepare):事务管理器向所有本地资源管理器发起请求,询问是否是 ready 状态,所有参与者都将本事务能否成功的信息反馈发给协调者;
第二阶段 (commit/rollback):事务管理器根据所有本地资源管理器的反馈,通知所有本地资源管理器,步调一致地在所有分支上提交或者回滚。

存在的问题:

同步阻塞:当参与事务者存在占用公共资源的情况,其中一个占用了资源,其他事务参与者就只能阻塞等待资源释放,处于阻塞状态。

单点故障:一旦事务管理器出现故障,整个系统不可用

数据不一致:在阶段二,如果事务管理器只发送了部分 commit 消息,此时网络发生异常,那么只有部分参与者接收到 commit 消息,也就是说只有部分参与者提交了事务,使得系统数据不一致。

不确定性:当协事务管理器发送 commit 之后,并且此时只有一个参与者收到了 commit,那么当该参与者与事务管理器同时宕机之后,重新选举的事务管理器无法确定该条消息是否提交成功。

TCC

关于 TCC(Try-Confirm-Cancel)的概念,最早是由 Pat Helland 于 2007 年发表的一篇名为《Life beyond Distributed Transactions:an Apostate’s Opinion》的论文提出。 TCC 事务机制相比于上面介绍的 XA,解决了其几个缺点:

  1. 解决了协调者单点,由主业务方发起并完成这个业务活动。业务活动管理器也变成多点,引入集群。
  2. 同步阻塞:引入超时,超时后进行补偿,并且不会锁定整个资源,将资源转换为业务逻辑形式,粒度变小。
  3. 数据一致性,有了补偿机制之后,由业务活动管理器控制一致性

TCC(Try Confirm Cancel)
Try 阶段:尝试执行,完成所有业务检查(一致性), 预留必须业务资源(准隔离性)
Confirm 阶段:确认执行真正执行业务,不作任何业务检查,只使用 Try 阶段预留的业务资源,Confirm 操作满足幂等性。要求具备幂等设计,Confirm 失败后需要进行重试。
Cancel 阶段:取消执行,释放 Try 阶段预留的业务资源 Cancel 操作满足幂等性 Cancel 阶段的异常和 Confirm 阶段异常处理方案基本上一致。

在 Try 阶段,是对业务系统进行检查及资源预览,比如订单和存储操作,需要检查库存剩余数量是否够用,并进行预留,预留操作的话就是新建一个可用库存数量字段,Try 阶段操作是对这个可用库存数量进行操作。
基于 TCC 实现分布式事务,会将原来只需要一个接口就可以实现的逻辑拆分为 Try、Confirm、Cancel 三个接口,所以代码实现复杂度相对较高。

本地消息表

本地消息表这个方案最初是 ebay 架构师 Dan Pritchett 在 2008 年发表给 ACM 的文章。该方案中会有消息生产者与消费者两个角色,假设系统 A 是消息生产者,系统 B 是消息消费者,其大致流程如下:

  1. 当系统 A 被其他系统调用发生数据库表更操作,首先会更新数据库的业务表,其次会往相同数据库的消息表中插入一条数据,两个操作发生在同一个事务中
  2. 系统 A 的脚本定期轮询本地消息往 mq 中写入一条消息,如果消息发送失败会进行重试
  3. 系统 B 消费 mq 中的消息,并处理业务逻辑。如果本地事务处理失败,会在继续消费 mq 中的消息进行重试,如果业务上的失败,可以通知系统 A 进行回滚操作

本地消息表实现的条件:

  1. 消费者与生成者的接口都要支持幂等
  2. 生产者需要额外的创建消息表
  3. 需要提供补偿逻辑,如果消费者业务失败,需要生产者支持回滚操作

容错机制:

  1. 步骤 1 失败时,事务直接回滚
  2. 步骤 2、3 写 mq 与消费 mq 失败会进行重试
  3. 步骤 3 业务失败系统 B 向系统 A 发起事务回滚操作

此方案的核心是将需要分布式处理的任务通过消息日志的方式来异步执行。消息日志可以存储到本地文本、数据库或消息队列,再通过业务规则自动或人工发起重试。人工重试更多的是应用于支付场景,通过对账系统对事后问题的处理。

可靠消息最终一致性

大致流程如下:

  1. A 系统先向 mq 发送一条 prepare 消息,如果 prepare 消息发送失败,则直接取消操作
  2. 如果消息发送成功,则执行本地事务
  3. 如果本地事务执行成功,则想 mq 发送一条 confirm 消息,如果发送失败,则发送回滚消息
  4. B 系统定期消费 mq 中的 confirm 消息,执行本地事务,并发送 ack 消息。如果 B 系统中的本地事务失败,会一直不断重试,如果是业务失败,会向 A 系统发起回滚请求

5.mq 会定期轮询所有 prepared 消息调用系统 A 提供的接口查询消息的处理情况,如果该 prepare 消息本地事务处理成功,则重新发送 confirm 消息,否则直接回滚该消息

该方案与本地消息最大的不同是去掉了本地消息表,其次本地消息表依赖消息表重试写入 mq 这一步由本方案中的轮询 prepare 消息状态来重试或者回滚该消息替代。其实现条件与余容错方案基本一致。目前市面上实现该方案的只有阿里的 RocketMq。

尽最大努力通知

最大努力通知是最简单的一种柔性事务,适用于一些最终一致性时间敏感度低的业务,且被动方处理结果 不影响主动方的处理结果。

这个方案的大致意思就是:

  1. 系统 A 本地事务执行完之后,发送个消息到 MQ;
  2. 这里会有个专门消费 MQ 的服务,这个服务会消费 MQ 并调用系统 B 的接口;
  3. 要是系统 B 执行成功就 ok 了;要是系统 B 执行失败了,那么最大努力通知服务就定时尝试重新调用系统 B, 反复 N 次,最后还是不行就放弃。

分布式事务实战

两阶段提交/XA

目前支付宝使用两阶段提交思想实现了分布式事务服务 (Distributed Transaction Service, DTS) ,它是一个分布式事务框架,用来保障在大规模分布式环境下事务的最终一致性。具体可参考支付宝官方文档:https://tech.antfin.com/docs/2/46887

TCC

TCC 需要事务接口提供 try, confirm, cancel 三个接口,提高了编程的复杂性。依赖于业务方来配合提供这样的接口,推行难度大,所以一般不推荐使用这种方式。

可靠消息最终一致性

目前市面上支持该方案的 mq 只有阿里的 rocketmq, 该方案应用场景也比较多,比如用户注册成功后发送邮件、电商系统给用户发送优惠券等需要保证最终一致性的场景

本地消息表

跨行转账可通过该方案实现。
用户 A 向用户 B 发起转账,首先系统会扣掉用户 A 账户中的金额,将该转账消息写入消息表中,如果事务执行失败则转账失败,如果转账成功,系统中会有定时轮询消息表,往 mq 中写入转账消息,失败重试。mq 消息会被实时消费并往用户 B 中账户增加转账金额,执行失败会不断重试。

小米海外商城用户订单数据状态变更,会将变更状态记录消息表中,脚本将订单状态消息写入 mq,最终消费 mq 给用户发送邮件、短信、push 等。

最大努力通知

最大努力通知最常见的场景就是支付回调,支付服务收到第三方服务支付成功通知后,先更新自己库中订单支付状态,然后同步通知订单服务支付成功。如果此次同步通知失败,会通过异步脚步不断重试地调用订单服务的接口。

小米海外商城目前除了支付回调外,最常用的场景是订单数据同步。例如系统 A、B 进行数据同步,当系统 A 发生订单数据变更,先将数据变更消息写入小米 notify 系统(作用等同 mq),然后 notify 系统异步处理该消息来调用系统 B 提供的接口并进行重试到最大次数。

总结

本文介绍了分布式事务的一些基础理论,并对常用的分布式事务方案进行了讲解,在文章的后半部分主要给出了各种方案的常用场景。分布式事务本身就是一个技术难题,业务中具体使用哪种方案还是需要根据自身业务特点自行选择,每种方案在实际执行过程中需要考虑的点都非常多,复杂度较大,所以在非必要的情况下,分布式事务能不用就尽量不用。

参考:

  1. “分布式服务化系统一致性的“最佳实干” https://mp.weixin.qq.com/s/khAwfJvWcwgbAYbBHbU8aQ

  2. “常用的分布式事务解决方案” https://blog.csdn.net/u010425776/article/details/79516298?tt_from=weixin&utm_source=weixin&utm_medium=toutiao_ios&utm_campaign=client_share&wxshare_count=1

  3. “深入分布式事务” http://www.codeceo.com/article/distributed-transaction.html

  4. CAP 原则 https://baike.baidu.com/item/CAP%E5%8E%9F%E5%88%99

  5. 事务 https://baike.baidu.com/item/%E4%BA%8B%E5%8A%A1/5945882

  6. 布式事务 https://baike.baidu.com/item/%E5%88%86%E5%B8%83%E5%BC%8F%E4%BA%8B%E5%8A%A1

  7. 《Atomic Distributed Transactions: a RESTful Design》

from:https://xiaomi-info.github.io/2020/01/02/distributed-transaction/