[Notice] Some nodes running iotex-core v1.x are in limbo state

Starting from 8/19 ~10PM, we observed that some nodes running iotex-core v1.x have been probated and never come back by themselves. The core-dev immediately took actions on this by:

  • Notifying stuck nodes to restart which helps to bring back the node into a normal operational state
  • Diving into the logs/code to identify the underlying cause.

Note that – the blockchain itself is functioning well and transfer/execution actions have not been impacted thanks to the fault-tolerance we built into the protocol and support from all delegate operators.

So far, the hypothesis is that the message processing queue on each node got stuck for some reason. The root cause is being figured out and we will deliver a hot patch to iotex-core v1.x as soon as possible.

Stay tuned as updates will be posted on this thread!

6 Likes

Core-dev figured out the root cause of this issue (i.e., the processing of one type of msg from the network blocks the processing of other msgs) around 6PM PDT today and took the immediate action to mitigate it. The network is healthy now with all nodes recovered from the limbo states.

As for the next step, enhancements will be implemented in the next few days to separate the processing of different messages from the network, and a new binary will be released and recommended to delegates to use!

What does not kill it makes it stronger :grinning:

2 Likes