Асинхронная логика на Go
2025-11-07 11:38Полезная особенность языка Go - на нём отлично моделируется асинхронная логика. Суть там в 4-фазном протоколе. Подробности в моём старом посте.

На Go всякий такой сигнал от одного гейта к другому делается в виде пары каналов:

Интересно было бы сдизайнить цельный асинхронный процессор в таком виде, скажем riscv32. А потом засунуть в FPGA.

На Go всякий такой сигнал от одного гейта к другому делается в виде пары каналов:
type Handshake[Treq any, Tack any] struct {
Req chan Treq
Ack chan Tack
}
По каналу запроса передаются значения в одну сторону, по каналу ответа - в обратную. Если нужен только сигнал, без всякого значения - передаём struct{} (аналог void). Вот пример реализации сложения двух чисел:
func AsyncAdder(a, b, sum *Handshake[int, struct{}]) {
for {
// Wait for inputs to arrive
var x, y int
haveA, haveB := false, false
for !(haveA && haveB) {
select {
case x = <-a.Req:
haveA = true
case y = <-b.Req:
haveB = true
}
}
// Compute sum
s := x + y
// Send result downstream (Req↑)
sum.Req <- s
// Wait for output Ack↑
<-sum.Ack
// Only now acknowledge inputs (completing 4-phase handshake)
a.Ack <- struct{}{}
b.Ack <- struct{}{}
}
}Полный текст здесь: add.go
Интересно было бы сдизайнить цельный асинхронный процессор в таком виде, скажем riscv32. А потом засунуть в FPGA.

no subject
Date: 2025-11-07 20:09 (UTC)no subject
Date: 2025-11-07 20:21 (UTC)no subject
Date: 2025-11-07 20:36 (UTC)Вот бы ещё куда деть сборку мусора, которая до 25% процессорного времени сжирает...
И зачем бы им ™ не сделать счётчики ссылок на объекты?
no subject
Date: 2025-11-07 21:03 (UTC)no subject
Date: 2025-11-07 21:33 (UTC)На моих задачах типа компиляторов расходы на сборку мусора незаметны. Можно не париться. Отличный язык этот Go получается.
no subject
Date: 2025-11-07 21:34 (UTC)no subject
Date: 2025-11-07 22:11 (UTC)Там весь runtime встаёт, и начинается рекурсивный обход всей памяти с раскрашиванием.
И всё стоит пока не закончится инвентаризация.
А счётчики будут есть в момент выхода из (примерно, когда в С++ деструкторы std::shared_ptr), поштучно.
А язык и мне нравится. Лет 5 уже.
no subject
Date: 2025-11-07 22:35 (UTC)no subject
Date: 2025-11-07 22:48 (UTC)no subject
Date: 2025-11-07 22:49 (UTC)no subject
Date: 2025-11-07 22:51 (UTC)no subject
Date: 2025-11-07 23:09 (UTC)no subject
Date: 2025-11-07 23:10 (UTC)Актуальное описание https://go.dev/doc/gc-guide
Оно заметно поменялось. Вместе с алгоритмами GC, похоже.
Остановка жизни теперь не до окончания сборки, а покороче.
КПД:
А все версии runtime опубликованы, конечно.
Признаться, не читал внутре.
no subject
Date: 2025-11-07 23:17 (UTC)Параллельные стрелочки ownership объектов всегда выглядят подозрительно. В C++ обычно вызывают крах через двойную деструкцию.
no subject
Date: 2025-11-07 23:20 (UTC)А на счётчики, только в а-ля ~shared_ptr().
no subject
Date: 2025-11-07 23:21 (UTC)Моя основная идея в том, что нужно понимать границы применимости инструмента и знаеть его слабые стороны. Иначе, как в известной поговорке - "...и руки порежет". :)
P.S. Или на erlang писать - там у каждого процесса свой gc. Впрочем, при желании "руки порезать" можно и там. Нет нигде счастья. :)
no subject
Date: 2025-11-07 23:29 (UTC)no subject
Date: 2025-11-07 23:54 (UTC)Вот простой бенчмарк: https://github.com/sergev/vak-opensource/blob/master/languages/c%2B%2B/gc_vs_sharedptr.cpp
В 4 с половиной раза разница.
no subject
Date: 2025-11-08 00:58 (UTC)обновление счетчика должно быть атомарной операцией, что порождает интенсивный обмен сигналами между ядрами
Счетчики не позволяют освободить циклические структуры. С графами трудно работать
no subject
Date: 2025-11-08 07:38 (UTC)no subject
Date: 2025-11-08 08:33 (UTC)no subject
Date: 2025-11-08 09:13 (UTC)управление памятью в го довольно хорошо сделано (гц знает про типы) и стремится распологать всё на стеке в отличии от традиционных ЯП с ГЦ.
гц в гашке работает вместе с планировщиком горутин. он делает две очень короткие stop-the-world фазы, но основная работа (marking и sweeping) выполняется параллельно. благодаря этому влияние STW минимально, особенно при достаточном запасе памяти. А когда память заканчивается - да, всё замедляется
no subject
Date: 2025-11-08 09:14 (UTC)no subject
Date: 2025-11-08 13:55 (UTC)(In principle = in some other real world VMs, so the problem is only about transferring the technology)
no subject
Date: 2025-11-08 13:57 (UTC)That doesn't look like a feature of the language. This is really about sequential consistency, and a lot of that is down to the software design, not language spec.
no subject
Date: 2025-11-08 14:58 (UTC)You can easily write CSP code like this (pseudocode):
create channel a create channel b send x on channel a send y on channel b start async function(a) = { ... } start async function(b) = { ... } -- Reading this code, you think that the first function will receive from "a" and the second function will receive from "b". But... ... -- a very different place in the code p <- receive from a ... -- another very different place q <- receive from b -- Now you have no idea who will receive what and when.And actually Pi calculus and Join calculus make this a bit cleaner, as all reception on a given channel must be statically defined in one place in the code, and you can't write code later, adding more reception on the same channels.
Join calculus pseudocode:
no subject
Date: 2025-11-08 15:32 (UTC)no subject
Date: 2025-11-08 16:47 (UTC)Like, SMTP:
ehlo := <-sock
if ehlo != "EHLO" ....break
from := <-sock
if from != "FROM" ....break
....
rcpt := <-sock
for rcpt == "RCPT" {
...
rcpt = <-sock
}
....
Yes, we have lots of places, where we receive from sock, but it is easy to see what is the happy path, and what gets aborted and why.
If you rewrite that in join calculus, you'll have to refer to state somewhere - have we successfully received ehlo? from? rcpt? More than once? What's the correct response in each case?
Also, you have explained how receiving from channels is made explicit. How about sending? You can get races just as well, if you don't control who's sending what and when.
no subject
Date: 2025-11-08 16:59 (UTC)Eg a.req goes down only after receiving a.ack, and a.ack goes down only after receiving a.req going down. Also, next a.req doesn't go up while a.ack is high. Etc.
no subject
Date: 2025-11-08 17:05 (UTC)no subject
Date: 2025-11-08 17:18 (UTC)And GC are not quite related to VM. In Go there is no VM for example.
no subject
Date: 2025-11-08 17:20 (UTC)no subject
Date: 2025-11-08 17:22 (UTC)no subject
Date: 2025-11-08 17:24 (UTC)no subject
Date: 2025-11-08 17:26 (UTC)no subject
Date: 2025-11-08 17:27 (UTC)no subject
Date: 2025-11-08 20:53 (UTC)In CSP you cannot guarantee that nobody else is receiving, because the channel is a value available somewhere else in the code (Where someone will be sending to it). Nothing prevents some other part of the code to start receiving from that same channel.
In join calculus all state needs to be explicitly held on messages in some mailboxes. Join calculus is inherently parallel, there is nothing sequential about it. At the same time, in join calculus reception is not an imperative effect: it is a static declaration that something should happen whenever a message is received on a channel.
We could write code in join calculus like this:
declare channels SOCK with type Text declare channel BREAK with type Text when receive x on SOCK -> if x == "EHLO" then ... else if x == "FROM" then ... else if x == "RCPT" then ... else send "expected EHLO | FROM | RCPT, but got ${x}" on BREAKBut this code doesn't enforce any sequential order on incoming messages.
Sequential logic needs to be simulated by specific reception rules. We will need an additional channel (EXPECT) for holding the current expected state of the protocol. Here is what you would need to write to simulate a protocol:
type Protocol = EHLO | FROM | RCPT declare channels SOCK with type Text declare channel EXPECT with type Protocol declare channel BREAK with type Text -- declare the reception logic: when receive x on SOCK and receive EHLO on EXPECT -> if x == "EHLO" then send FROM on EXPECT else send "expected EHLO, got ${x}" on BREAK when receive x on SOCK and receive FROM on EXPECT -> if x == "FROM" then send RCPT on EXPECT else send "expected FROM, got ${x}" on BREAK when receive x on SOCK and receive RCPT on EXPECT -> if x == "RCPT" then ... -- continue, send whatever else we need else send "expected RCPT, got ${x}" on BREAK emit EHLO on EXPECTThe outside world needs to send us on messages on the channel SOCK. If several threads send concurrently various messages then things can get jumbled. In this code we statically guarantee that reception will be in order of the protocol, assuming that there is only one sending thread and assuming that the messages aren't sent too quickly.
In join calculus one can also statically guarantee that messages are sent one by one, sequentially, or with a given parallelism - say, at most 3 in parallel. (This code does not do that.)
I wrote tutorials on join calculus many years ago.
https://chaource.dreamwidth.org/tag/join_calculus
no subject
Date: 2025-11-08 21:24 (UTC)How's that possible?
Or is this actually "cannot be sent [in a particular piece of software]"?
no subject
Date: 2025-11-08 21:33 (UTC)To do that, we need two things:
- The code outside any reception should statically send at most one message on that channel.
- The channel is only available in a single local scope. The entire code that can emit on that channel must be in that scope.
- The code that receives that message will immediately emit that message after processing.
A standard example is an "async counter".
function create_counter(init_value: Int): (Channel[Unit], Channel[Unit], Channel[Unit]) = { declare channel COUNTER with type Int declare channel INCR with type Unit declare channel DECR with type Unit declare channel READ with type Int when receive x on COUNTER and receive () on INCR -> send x + 1 on COUNTER when receive x on COUNTER and receive () on DECR -> send x - 1 on COUNTER when receive x on COUNTER and receive () on READ -> send x on COUNTER send x on READ send init_value on COUNTER (INCR, DECR, READ) }This function declares a local scope in which there are four channels: COUNTER, DECR, INCR, READ.
The function returns three of those channels but keeps COUNTER encapsulated in its scope.
The code of the function sends a single message on COUNTER. When a message is received (and consumed), immediately another message is sent on that channel. So, at any time there is at most one message on that channel. Nobody can mess this up, no external code could ever change that logic.
External code that uses create_counter will look like this:
no subject
Date: 2025-11-09 00:29 (UTC)no subject
Date: 2025-11-09 09:04 (UTC)If we want to guarantee that you can't send DECR until the previous READ has been consumed or something like that, you need blocking operations: trying to send on DECR will block until previous operations are cleared. Join calculus as implemented in Jocaml has that feature (messages whose sending operator will block until some conditions are met, and then a value is returned).
no subject
Date: 2025-11-09 19:30 (UTC)no subject
Date: 2025-11-10 07:56 (UTC)Join calculus is an improvement over the actor model, it is a full-featured, purely functional and high-level declarative concurrency framework. In my (very limited) experience, join calculus rather helps programmers to see where race conditions will happen, and it is possible to guarantee statically that certain pathways are free of race conditions. It's a significant improvement over the actor model.
However, if we don't consider Jocaml, there is not a single industry-strengh implementation of join calculus. It's too new and too little known.
no subject
Date: 2025-11-10 13:01 (UTC)In my experience of proving the correctness of concurrent algorithms, the problem is not knowing which statements can "synchronize-with" which - that's pretty obvious from object field names. The problem is really in knowing which statements cannot "synchronize-with" which.
My understanding of what you described is that we can have one and only one synchronize-with edge, but this won't really simplify the problem, because the statements before and after them are going to reflect the same complexity.
For example, multiple reads in CSP will translate into one receive point with conditional like if state == X, do A else if state == Y, do B else if state == Z, do C else......
If so, this is hardly a simplification.
no subject
Date: 2025-11-10 17:44 (UTC)I do not have a lot of experience with CSP but in my view it is easy to use CSP for one kind of code only: a protocol where there are two threads exchanging messages and synchronizing after each message. Each thread is sequential, and each waits for the next message before proceeding.
Thread 1: wait for A. Once A is received, do work, then send B. Wait for C. Once C is received, do work, then send D.
Thread 2: send A. Wait for B. Once B is received, do work, then send C. Wait for D. Once D is received, do work, etc.
CSP is declarative for this scenario. But anything more complicated than this - any kind of multiple, parallel, asynchronous messages that could arrive in any order, messages that create more threads that can again send messages, etc. - and CSP becomes spaghetti code. Am I wrong?
no subject
Date: 2025-11-10 20:13 (UTC)That's a broad statement.
Sure, anything more complicated calls for efficient management of complexity. Then it really depends on the tools at our disposal. I don't see how join calculus can remove complexity. Maybe it offers some new way to express it. But I wouldn't say that there are many tasks where things are completely asynchronous and arrive in arbitrary order. That seems to sound like everything is a set - whereas we know from experience that most used primitives are key-value lookup tables (hashtables are possibly the cheapest) and lists.
Likewise I think communication is most often a series of messages, and very rarely unordered single-shot exchanges. Although mathematically the set may be more fundamental than the list.
> CSP becomes spaghetti code
I don't see how join calculus can solve that problem.
The way I see it, CSP makes program order explicit, and synchronisation order hard to work out. The way I understood your explanation, in join calculus synchronisation order is apparent, but then program order is hard to work out.
It looks more like a trade-off between two aspects of the same problem.
no subject
Date: 2025-11-11 11:42 (UTC)Join calculus is declarative for problems where synchronization is most important; such as the dining philosophers problem. But sequential message protocols aren't represented declaratively; they need to be programmed via a complicated set of synchronizations.
CSP is declarative for sequential exchange protocols between a fixed number of parallel-running processes. But CSP is not good for a fully-concurrent world where the number of parallel threads is not set in advance, where a message can come at any time from anywhere and needs to start new threads that again will send new messages, etc.
no subject
Date: 2025-11-11 14:40 (UTC)With a caveat that Golang, Java (Loom) and others allow to create cheap threads and channels at will, so available parallelism can be exploited. Of course, technically that's tons of threads to handle, but lexical scope and other language features make the complexity manageable.
However, there's also the curse of diminishing returns.
For example, the mathematically optimal number of threads can be 1M, but the difference between that and 10 threads can be purely academic.
I mean, we all know how to compute average queue wait time, or queue length for a given number of "servers", a distribution of time of arrival, and a distribution of time of departure. Playing with these parameters you can see that after some very modest number of "servers" the difference between that and theoretical optimum is very small.
So I take such tools really only as a way to express ideas cleanly, not really exploit concurrency better. So instead of writing an iterator with hasNext() and next(), we write plain code that communicates with the caller via a channel - basically, it is the means to avoid Continuation-passing transform of the logic; the channel is the link between cat C and cat OpC.