I want to amend Tudors answer which is a good starting point. There are two main overheads of threads:
- Starting and stopping them. Involves creating a stack and kernel objects. Involves kernel transitions and global kernel locks.
- Keeping their stack around.
(1) is only a problem if you are creating and stopping them all the time. This is solved commonly using thread pools. I consider this problem to be practically solved. Scheduling a task on a thread pool usually does not involve a trip to the kernel which makes it very fast. The overhead is on the order of a few interlocked memory operations and a few allocations.
(2) This becomes important only if you have many threads (> 100 or so). In this case async IO is a means to get rid of the threads. I found that if you don’t have insane amounts of threads synchronous IO including blocking is slightly faster than async IO (you read that right: sync IO is faster).