Fixing the Race Condition - SemaphoreSlim in Action

Post 2 of the series: Advanced C# for Your Next Interview

Where We Left Off

In the previous post we built the simplest possible file storage. Read the whole file, deserialize, add a record, serialize back, write the whole file. Clean, readable, fully async.

Then we ran 20 parallel writes and got this:

Rows in file : 1
Lost         : 16
PROBLEM - data lost

The culprit was a read-modify-write race condition. All tasks read the file before any of them wrote back, so they all started with an empty list. Each wrote only its own record. The last writer won, everyone else was silently discarded, or got an IOException when the OS refused to open a file already held by another task.

Today we fix it.

What We Need

We need mutual exclusion on the write operation. Fancy term, simple idea: only one task should be allowed to do the full read-modify-write cycle at a time. Others should wait in line.

In synchronous code you’d use lock. But lock does not work with await because you cannot hold a lock across an async operation. This is where SemaphoreSlim comes in.

SemaphoreSlim - The Basics

SemaphoreSlim is an async-friendly synchronization primitive. It works like a counter with a maximum value. When a task calls WaitAsync(), the counter goes down by one. When it calls Release(), the counter goes back up. If the counter is already at zero, WaitAsync() waits until someone releases.

var semaphore = new SemaphoreSlim(1, 1);
//                               ^  ^
//                               |  maximum count
//                               initial count

With (1, 1) we get a binary semaphore, so only one task can be inside at any time. This is equivalent to a mutex but works with await.

The key difference from lock:

// This is WRONG - you cannot await inside lock
lock (_syncObj)
{
    await File.WriteAllTextAsync(...); // compiler error
}

// This works
await _semaphore.WaitAsync();
try
{
    await File.WriteAllTextAsync(...); // perfectly fine
}
finally
{
    _semaphore.Release();
}

The Fix

Here is the updated WriteAsync:

public class ConcurrentFileStorage : IFileStorage<FileRecord>
{
    private readonly string _filePath;
    private readonly SemaphoreSlim _writeLock = new(1, 1);

    public async Task WriteAsync(FileRecord record, CancellationToken ct = default)
    {
        await _writeLock.WaitAsync(ct);
        try
        {
            var json = await File.ReadAllTextAsync(_filePath, ct);
            var records = JsonSerializer.Deserialize<List<FileRecord>>(json) ?? [];

            records.Add(record);

            await File.WriteAllTextAsync(
                _filePath,
                JsonSerializer.Serialize(records),
                ct);
        }
        finally
        {
            _writeLock.Release();
        }
    }
}

Three things to notice:

WaitAsync(ct), not Wait() - the sync version Wait() would block the thread while waiting. WaitAsync() releases the thread back to the pool and resumes when the semaphore is available. In async code always use the async version.
The finally block is not optional - if anything throws inside the try, we still need to release the lock. Without finally, one failed write would leave the semaphore at zero permanently. Every future write would wait forever. The application would hang.
We pass ct to WaitAsync - if the operation is cancelled while waiting in line, WaitAsync throws OperationCanceledException and the semaphore is not acquired. This is correct behavior. No release is needed because we never entered.

Running the Same Stress Test

var tasks = Enumerable.Range(0, 20).Select(i =>
    storage.WriteAsync(
        new FileRecord(Guid.NewGuid(), $"Record-{i}", $"Data-{i}", DateTime.UtcNow))
).ToList();

await Task.WhenAll(tasks);

Output:

[Thread 04] WRITE - Record-3  (list: 2 records)
[Thread 06] WRITE - Record-11 (list: 3 records)
[Thread 04] WRITE - Record-7  (list: 4 records)
[Thread 09] WRITE - Record-0  (list: 5 records)
...

Rows in file : 20
Lost         : 0
OK

Each task sees the list growing. No collisions, no exceptions, no data loss.

What About Reads?

We only lock writes. Reads are free to run without the semaphore:

public async Task<FileRecord?> FindAsync(Guid id, CancellationToken ct = default)
{
    // no lock here
    var json = await File.ReadAllTextAsync(_filePath, ct);
    var records = JsonSerializer.Deserialize<List<FileRecord>>(json) ?? [];
    return records.FirstOrDefault(r => r.Id == id);
}

This is intentional. Multiple readers can access the file simultaneously without any risk. Reading does not modify anything. Only writes need to be serialized.

If you need strict read-write isolation, for example to guarantee that a read never sees a partially written file, you can use ReaderWriterLockSlim. It allows multiple concurrent readers but exclusive access for writers. Worth knowing for the interview, but for our purposes SemaphoreSlim on writes is enough.

SemaphoreSlim vs Other Primitives

This comes up often in interviews:

lock - synchronous only, cannot be used with await. Good for short CPU-bound critical sections.

SemaphoreSlim(1, 1) - async-friendly mutex. Use when you need to protect an async operation. This is what we used.

SemaphoreSlim(n, n) - allows up to n concurrent tasks. Useful for throttling, for example allowing max 5 concurrent HTTP requests.

ReaderWriterLockSlim - differentiates between read and write access. Multiple readers allowed simultaneously, writers get exclusive access. Good when reads are frequent and writes are rare. No native async support, so it needs a wrapper.

Mutex - system-level, works across processes. Heavy. Use only when you need cross-process synchronization.

What’s Still Wrong

The lock solves data corruption. But look at what every operation still does:

var json = await File.ReadAllTextAsync(_filePath, ct);              // reads ENTIRE file
var records = JsonSerializer.Deserialize<List<FileRecord>>(json);   // allocates a list
records.Add(record);
await File.WriteAllTextAsync(_filePath, JsonSerializer.Serialize(records), ct); // writes ENTIRE file

Every single write reads the entire file into memory, deserializes it into a list, adds one item, serializes the whole thing back, and rewrites the file from scratch. With 1,000 records this allocates a lot. With 100,000 records this becomes a serious problem.

In the next post we start attacking the allocation side of this. We will switch to an append-only format and introduce Span<T> and ArrayPool<T> to eliminate the unnecessary memory pressure.