Debugging Deadlocks using Java Synchronization Aids

Postat în Știință/Tehnică
la 07-04-2025 10:58
60 vizualizări

Imaginea este preluată automat împreună cu articolul de pe Kaizen Driven Development

by Horațiu Dan

One of the most famous deadlocks is the one encountered in the well-known problem of the ‘dining philosophers’. Briefly, it is said that ‘n’ philosophers sit at a round table aiming for Chinese food. On the table, there are ‘n’ chopsticks, one between every two philosophers. As the venue is a pleasant and productive one, they are not only eating, but also thinking, alternating between the two. In order to be able to eat, each needs to acquire two chopsticks first, eat, then put them back on the table and get back to thinking. Without getting into further details, one can easily observe that in the situation where each philosopher grabs the chopstick to his right and then waits for the one on the left without realizing the former, the deadlock appears.

Moving to threads, the simplest deadlock example is when one holds a lock forever while the others, aiming to acquire the same lock, block waiting. Simpler put, for two threads, if the former holds lock A and aims for lock B, while the latter holds lock B and aims for lock A, both will wait forever.

Similarly, at database level, deadlocks appear when two or more concurrent transactions are blocked, each waiting for the others to release a resource they need.

Abstract

A while ago, I confronted with a database deadlock which unfortunately appeared in one of our productions. The problem was identified and successfully fixed. Yet, what kept my attention during this investigation was the attempt to reproduce the issue in an isolated manner, by trying to guarantee the concurrent execution of the actions deadlocking. In this article, first a “straight forward” deadlock is fabricated, then, the focus is set on how to execute two actions “as simultaneous as possible” by leveraging two Java synchronization aids – CyclickBarrier and CountDownLatch.

Set-Up

The proof of concept is build with:

Java 21
PostgreSQL 13
PostgreSQL Driver version 42.7.5.

For simplicity, Spring Boot version 3.4.4, Flyway 10.20.1 and Apache Maven 3.9.9 are also used.

The Deadlock

Let Entity1 and Entity2 be two simple entities. They are similar and described only by an unique identifier (the primary key) and a text.

@Entity
@Table(name = "entity1")
public class Entity1 {

    @Id
    @Column(name = "id")
    private Long id;

    @Column(name = "text", nullable = false)
    private String text;
        
        ...
}

@Entity
@Table(name = "entity2")
public class Entity2 {

    @Id
    @Column(name = "id")
    private Long id;

    @Column(name = "text", nullable = false)
    private String text;

        ...
}

EntityProcessor is a component that defines two operations, each running in its own transaction, but involving the same two entities, of type Entity1 and of Entity2 respectively. Obviously, it is assumed they exist.

1. process1():
        - begins the transaction
        - reads Entity1 from the database
        - modifies its text
        - reads Entity2 from the database
        - modifies its text
        - commits the transaction

2. process2():
        - begins the transaction
        - reads Entity2 from the database
        - modifies its text
        - reads Entity1 from the database
        - modifies its text
        - commits the transaction

@Service
public class EntityProcessor {

    private final Entity1Repository entity1Repo;
    private final Entity2Repository entity2Repo;

    public EntityProcessor(Entity1Repository entity1Repo, Entity2Repository entity2Repo) {
        this.entity1Repo = entity1Repo;
        this.entity2Repo = entity2Repo;
    }

    @Transactional
    public void process1(long entity1Id, long entity2Id) {
        final int index = 1;

        processEntity1(index, entity1Id);

        processEntity2(index, entity2Id);
    }

    @Transactional
    public void process2(long entity1Id, long entity2Id) {
        final int index = 2;

        processEntity2(index, entity2Id);

        processEntity1(index, entity1Id);
    }

    private void processEntity1(int index, long entityId) {
        Entity1 entity1 = entity1Repo.findById(entityId)
                .orElseThrow(() -> new RuntimeException("Entity1 not found"));

        entity1.setText("Set by process " + index);
    }

    private void processEntity2(int index, long entityId) {
        Entity2 entity2 = entity2Repo.findById(entityId)
                .orElseThrow(() -> new RuntimeException("Entity2 not found"));

        entity2.setText("Set by process " + index);
    }
}

As part of this POC, the “business” goal is to modify the text of the two entities twice, irrespective of their order, but in parallel. The above operations (transactions) lead to a deadlock if executed concurrently due to the opposite lock sequences. To resolve the problem, or at least to significantly reduce the risk of blocking, both should lock the two entities in the same order.

When process1() and process2() (as defined above) are run in parallel, the below exception is obtained, highlighting the deadlock.

[virtual-61] DEBUG SqlExceptionHelper#could not execute statement [update entity1 set text=? where id=?]
org.postgresql.util.PSQLException: ERROR: deadlock detected
  Detail: Process 36964 waits for ShareLock on transaction 29998; blocked by process 48636.
Process 48636 waits for ShareLock on transaction 29999; blocked by process 36964.
  Hint: See server log for query details.
  Where: while updating tuple (0,7) in relation "entity1"
        at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2733)
        at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2420)
        at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:372)
        ...
        at com.hcd.deadlock.service.EntityProcessor$$SpringCGLIB$$0.process2(<generated>)

The stack trace clearly outlines the issue.

The Parallel Execution

The deadlock was introduced in the previous section. The main goal of this article is to create the proper set-up for guaranteeing the actual concurrent execution of the two transactions.

For this experiment, two Java synchronization aids are used – CyclicBarrier and CountDownLatch. A unit test is created for each, with the following common set-up.

@SpringBootTest
@Rollback(false)
class Test {
    
    @Autowired
    private EntityProcessor entityProcessor;

    @Autowired
    private Entity1Repository entity1Repo;

    @Autowired
    private Entity2Repository entity2Repo;

    private Entity1 entity1;
    private Entity2 entity2;

    @BeforeEach
    void setUp() {
        entity1 = entity1Repo.save(new Entity1(1L));
        entity2 = entity2Repo.save(new Entity2(2L));
    }

    @AfterEach
    void tearDown() {
        entity1Repo.delete(entity1);
        entity2Repo.delete(entity2);
    }
}

First Entity1 and Entity2 are created, then EntityProcessor#process1() and EntityProcessor#process2() are executed concurrently. Ultimately, the clean-up is done and the two entities are deleted. Thus, the state of the database remains unaltered, as these tests are not transactional in order to try to simulate as much as possible real use-cases.

Aid 1 – The CountDownLatch

A CountDownLatch is “a synchronization aid that allows one or more threads to wait until a set of operations being performed in other threads completes” [1].

The test prepares and schedules two threads for executing EntityProcessor#process1() and EntityProcessor#process2() respectively and constructs a CountDownLatch. The CountDownLatch is initialized with a count of 1 and thus acts as a toggle or a simple gate. The behavior of the threads is similar and described by the ProcessTask Runnable. They first invoke latch.await() and consequently block into a waiting state, until “the gate” is opened by the main thread which invokes the latch.countDown(), brings its count to zero and starts the actual parallel processing.

@Test
void run()  {
        CountDownLatch latch = new CountDownLatch(1);

        try (ExecutorService exec = Executors.newVirtualThreadPerTaskExecutor()) {
                Future<?> future1 = exec.submit(new ProcessTask(latch,
                                () -> entityProcessor.process1(entity1.getId(), entity2.getId())));

                Future<?> future2 = exec.submit(new ProcessTask(latch,
                                () -> entityProcessor.process2(entity1.getId(), entity2.getId())));

                latch.countDown();

                future1.get();
                future2.get();

        } catch (ExecutionException | InterruptedException e) {
                throw new RuntimeException(e);
        }

        log.info("All processors completed.");
}

private record ProcessTask(CountDownLatch latch, Runnable runnable) implements Runnable {
        
        @Override
        public void run() {             
                try {
                        latch.await();
                } catch (InterruptedException e) {
                        throw new RuntimeException(e);
                }

                runnable.run();
        }
}

Aid 2 – The CyclicBarrier

A CyclicBarrier “allows a set of threads to all wait for each other to reach a common barrier point” [2], in this case, the start of the actual processing – the execution of EntityProcessor#process1() and EntityProcessor#process2().

Similarly as above, the test prepares and schedules the two threads and constructs a CyclicBarrier with 3 parties – the two tasks and the main thread – then calls barrier.await() – call number one. The behavior of the threads is similar with the former and described by the ProcessTask Runnable. They first invoke barrier.await() – call number two and number three. Once all three parties call barrier.await(), they will all proceed together and start the actual parallel processing.

@Test
void run()  {
        CyclicBarrier barrier = new CyclicBarrier(3);

        try (ExecutorService exec = Executors.newVirtualThreadPerTaskExecutor()) {
                Future<?> future1 = exec.submit(new ProcessTask(barrier,
                                () -> entityProcessor.process1(entity1.getId(), entity2.getId())));

                Future<?> future2 = exec.submit(new ProcessTask(barrier,
                                () -> entityProcessor.process2(entity1.getId(), entity2.getId())));

                barrier.await();

                future1.get();
                future2.get();

        } catch (ExecutionException | InterruptedException | BrokenBarrierException e) {
                throw new RuntimeException(e);
        }

        log.info("All processors completed.");
}


private record ProcessTask(CyclicBarrier barrier, Runnable runnable) implements Runnable {

        @Override
        public void run() {
                try {
                        barrier.await();
                } catch (InterruptedException | BrokenBarrierException e) {
                        log.error("Could not await().", e);
                }

                runnable.run();
        }
}

Take Aways

In the case of the CountDownLatch, the processing threads run independently and in parallel, but start the actual processing – runnable.run() – only when released by the main thread which decrements the latch to zero.

When using the CyclicBarrier, the situation is very similar, each of the parties wait at the barrier and the last one that arrives, releases them.

Unlike CountDownLatch, which is a one-time use only, a CyclicBarrier may be reused multiple times, making it particularly useful when having multiple phases where a set of threads must wait for each other and synchronize repeatedly.

This article presents two ways of testing deadlock situations in isolation which might prove quite useful when such issues are detected.

When executing either of the above tests, the deadlock appears and the corresponding exception is thrown. Nevertheless, although the purpose here is not to describe how to resolve deadlocks in general, it is worth re-iterating that ordering concurrent lock operations resolves such problems or significantly reduces the risk of deadlocking.