vak | Асинхронная логика на Go

Полезная особенность языка Go - на нём отлично моделируется асинхронная логика. Суть там в 4-фазном протоколе. Подробности в моём старом посте.

На Go всякий такой сигнал от одного гейта к другому делается в виде пары каналов:

type Handshake[Treq any, Tack any] struct {
    Req chan Treq
    Ack chan Tack
}

По каналу запроса передаются значения в одну сторону, по каналу ответа - в обратную. Если нужен только сигнал, без всякого значения - передаём struct{} (аналог void). Вот пример реализации сложения двух чисел:

func AsyncAdder(a, b, sum *Handshake[int, struct{}]) {
    for {
        // Wait for inputs to arrive
        var x, y int
        haveA, haveB := false, false

        for !(haveA && haveB) {
            select {
            case x = <-a.Req:
                haveA = true
            case y = <-b.Req:
                haveB = true
            }
        }

        // Compute sum
        s := x + y

        // Send result downstream (Req↑)
        sum.Req <- s

        // Wait for output Ack↑
        <-sum.Ack

        // Only now acknowledge inputs (completing 4-phase handshake)
        a.Ack <- struct{}{}
        b.Ack <- struct{}{}
    }
}

Полный текст здесь: add.go

Интересно было бы сдизайнить цельный асинхронный процессор в таком виде, скажем riscv32. А потом засунуть в FPGA.

Threaded | Top-Level Comments Only

From:

juan_gandhi

Ну это прям как в pi-calculus.

From:

vak

Идеи-то общие бродят.

From:

vlad_m

Да. Суслик с его внутренней асинхронной многозадачностью очень хорош.
Вот бы ещё куда деть сборку мусора, которая до 25% процессорного времени сжирает...
И зачем бы им ™ не сделать счётчики ссылок на объекты?

Edited Date: 2025-11-07 21:12 (UTC)

From:

chaource

Въ этомъ Golang очень легко создать race condition. Очень легко открыть каналъ и начать читать изъ него и забыть, что этотъ каналъ уже былъ открытъ въ другомъ мѣстѣ кода, и что тамъ уже изъ этого канала тоже асинхронно читаютъ (въ произвольномъ порядкѣ и съ неопредѣленной семантикой). Кодъ будетъ выглядѣть правильно, но иногда не будетъ работать.

From:

vak

Всего 25%? Фигня вопрос. Счётчики ссылок столько же съедят. А если надо без мусора - всегда ведь есть Rust.

На моих задачах типа компиляторов расходы на сборку мусора незаметны. Можно не париться. Отличный язык этот Go получается.

From:

vak

Топор тоже опасный инструмент, если ним не уметь пользоваться. 😀

From:

vlad_m

Счётчики, думаю, съедят всё таки меньше.
Там весь runtime встаёт, и начинается рекурсивный обход всей памяти с раскрашиванием.
И всё стоит пока не закончится инвентаризация.

А счётчики будут есть в момент выхода из (примерно, когда в С++ деструкторы std::shared_ptr), поштучно.

А язык и мне нравится. Лет 5 уже.

From:

vak

Если мусора много, то и счётчиков много. Причём расход на счётчики имеем всегда, а на мусор только при необходимости. Которой может быть немного при хорошем дизайне программы.

From:

ufm

Не пользуйтесь глобальными переменными и будет Вам счастье. :)

From:

ufm

А можно посмотреть где нибудь код, на котором GC начнёт 25% процессора жрать? Вот прям интересно, как такое умудриться сделать можно. :)

From:

vak

Если куча постоянно больше чем наполовину занята, может и жрать.

From:

ufm

Не, ну если вобще не думать что ты пишешь, то можно написать дичь на любом языке. Я, всё таки, интересуюсь "адекватным" кодом. А так можно и на обычном С засрать память так, что программа будет валиться по нехватке памяти не смотря на то, что память-то есть, только дико фрагментированная.

From:

vlad_m

Это я вычитал когда-то из описания GC.

Актуальное описание https://go.dev/doc/gc-guide

Оно заметно поменялось. Вместе с алгоритмами GC, похоже.

Остановка жизни теперь не до окончания сборки, а покороче.

КПД:
А все версии runtime опубликованы, конечно.
Признаться, не читал внутре.

Edited Date: 2025-11-07 23:19 (UTC)

From:

vlad_m

Ну, при желании можно и из, например, контекста main() инжекировать один chan в два объекта, где и будут порождены две читающие нити.

Параллельные стрелочки ownership объектов всегда выглядят подозрительно. В C++ обычно вызывают крах через двойную деструкцию.

From:

vlad_m

А по мне, так на мусор имеем расход всегда, даже, когда ничего не освобождаем и не собирались.
А на счётчики, только в а-ля ~shared_ptr().

From:

ufm

Оно не просто "поменялось". Оно вот прям сейчас продолжает меняться, я у себя ссылку давал на https://go.dev/blog/greenteagc

Моя основная идея в том, что нужно понимать границы применимости инструмента и знаеть его слабые стороны. Иначе, как в известной поговорке - "...и руки порежет". :)

P.S. Или на erlang писать - там у каждого процесса свой gc. Впрочем, при желании "руки порезать" можно и там. Нет нигде счастья. :)

From:

ufm

Опять таки, паралельное чтение из канала - это абсолютно штатная фича. Класический Worker-pool. Можно об него руки порезать? При желании - конечно. :)

From:

vak

Когда ничего не освобождаем и не выделяем, на мусор расход ровно 0. А на счётчики постоянно расход идёт. Бывает в разы.

Вот простой бенчмарк: https://github.com/sergev/vak-opensource/blob/master/languages/c%2B%2B/gc_vs_sharedptr.cpp

$ g++ -std=c++17 -O2 `pkgconf --cflags bdw-gc` gc_vs_sharedptr.cpp -o gc_vs_sharedptr `pkgconf --libs bdw-gc`
$ ./gc_vs_sharedptr
Benchmark: 1000000 objects
shared_ptr avg time: 21.8132 ms
Boehm GC avg time:  4.79645 ms

В 4 с половиной раза разница.

From:

ircicq

Счётчики ссылок очень медленны для много-тредовых приложений
обновление счетчика должно быть атомарной операцией, что порождает интенсивный обмен сигналами между ядрами

Счетчики не позволяют освободить циклические структуры. С графами трудно работать

Edited Date: 2025-11-08 01:00 (UTC)

From:

spamsink

Если надо минимизировать peak VM, то сборка мусора - не вариант.

From:

sab123

Он еще умеет взаимоблокироваться. Или, как альтернатива, терять сообщения.

From:

x86128

тут еще бонус в том что рантайм гошки задетектит дедлоки.
управление памятью в го довольно хорошо сделано (гц знает про типы) и стремится распологать всё на стеке в отличии от традиционных ЯП с ГЦ.
гц в гашке работает вместе с планировщиком горутин. он делает две очень короткие stop-the-world фазы, но основная работа (marking и sweeping) выполняется параллельно. благодаря этому влияние STW минимально, особенно при достаточном запасе памяти. А когда память заканчивается - да, всё замедляется

From:

x86128

а по сабжу, прикольная тема, надо бы попробовать модель risc-v собрать на таком движке

From:

sassa_nf

In principle GC pauses can be in microseconds.

(In principle = in some other real world VMs, so the problem is only about transferring the technology)

From:

sassa_nf

> создать race condition

That doesn't look like a feature of the language. This is really about sequential consistency, and a lot of that is down to the software design, not language spec.

From:

chaource

It's a feature of the CSP paradigm that you can't easily see all places where you receive messages from channels.

You can easily write CSP code like this (pseudocode):

create channel a
create channel b
send x on channel a
send y on channel b

start async function(a) = { ... }
start async function(b) = { ... }
-- Reading this code, you think that the first function will receive from "a" and the second function will receive from "b". But...
... -- a very different place in the code
p <- receive from a 
... -- another very different place
q <- receive from b
-- Now you have no idea who will receive what and when.

And actually Pi calculus and Join calculus make this a bit cleaner, as all reception on a given channel must be statically defined in one place in the code, and you can't write code later, adding more reception on the same channels.

Join calculus pseudocode:

create channels a, b, c   -- Message types will be type-inferred later.
when a receives Some(x) -> ...
when a receives None ->
when b receives y and c receives True -> ...
when b receives y and c receives False -> ...

-- Finished a static definition of reception from a, b, c. It is inferred that "a" has messages of type Option[A] and "c" has messages of type Boolean, and probably the message type of "b" is also inferred from the further code.
-- There may be many clauses here, but it's a static definition (like a "sealed trait" in Scala, no more cases).

send 123 on a

send 456 on b

-- We can send more data later but we can't redefine reception.

Edited Date: 2025-11-08 15:00 (UTC)

From:

chaource

Но проблема не въ параллельности какъ таковой. Проблема въ томъ, что мы можемъ случайно, не желая того, создать эту параллельность, потому что трудно сразу увидѣть всѣ мѣста, гдѣ въ кодѣ будетъ производиться чтенiе изъ даннаго канала. И мы можемъ всегда въ любомъ модулѣ добавить новое чтенiе, не замѣтивъ, что это создаетъ race condition.

From:

sassa_nf

Maybe. But CSP makes it clear how to write protocols, because state is implied in program order.

Like, SMTP:

ehlo := <-sock
if ehlo != "EHLO" ....break
from := <-sock
if from != "FROM" ....break
....
rcpt := <-sock
for rcpt == "RCPT" {
...
rcpt = <-sock
}
....

Yes, we have lots of places, where we receive from sock, but it is easy to see what is the happy path, and what gets aborted and why.

If you rewrite that in join calculus, you'll have to refer to state somewhere - have we successfully received ehlo? from? rcpt? More than once? What's the correct response in each case?

Also, you have explained how receiving from channels is made explicit. How about sending? You can get races just as well, if you don't control who's sending what and when.

From:

sassa_nf

It seems there needs to be more logic here.

Eg a.req goes down only after receiving a.ack, and a.ack goes down only after receiving a.req going down. Also, next a.req doesn't go up while a.ack is high. Etc.

From:

vak

Есть сборщики мусора с локальностью.

From:

vak

For most applications pauses in tens of milliseconds are insensitive. Most of user space.

And GC are not quite related to VM. In Go there is no VM for example.

From:

vak

Вот поэтому в асинхронной логике и существует 4-фазный протокол, а не просто обмен сообщениями.

From:

vak

Ага. Именно сложить его из элементарных асинхронных кубиков снизу вверх.

From:

vak

Right, strictly speaking there must be one more channel in type Handshake. But that’s for hardware. For modeling in Go it’s not needed.

From:

vak

Асинхронный 4-фазный протокол не взаимоблокируется и ничего не теряет.

From:

vak

В 4-фазном протоколе не бывает дедлоков. Доказывается математически.

From:

chaource

When you write a sequential program that performs a sequence of "receive" operations from a channel then yes, you can guarantee a certain order of events. But only if nobody else is receiving from that channel in another thread.

In CSP you cannot guarantee that nobody else is receiving, because the channel is a value available somewhere else in the code (Where someone will be sending to it). Nothing prevents some other part of the code to start receiving from that same channel.

In join calculus all state needs to be explicitly held on messages in some mailboxes. Join calculus is inherently parallel, there is nothing sequential about it. At the same time, in join calculus reception is not an imperative effect: it is a static declaration that something should happen whenever a message is received on a channel.

We could write code in join calculus like this:

declare channels SOCK with type Text
declare channel BREAK with type Text

when receive x on SOCK ->
  if x == "EHLO" then ...
  else if x == "FROM" then ...
  else if x == "RCPT" then ...
  else send "expected EHLO | FROM | RCPT, but got ${x}" on BREAK

But this code doesn't enforce any sequential order on incoming messages.

Sequential logic needs to be simulated by specific reception rules. We will need an additional channel (EXPECT) for holding the current expected state of the protocol. Here is what you would need to write to simulate a protocol:

type Protocol = EHLO | FROM | RCPT

declare channels SOCK with type Text
declare channel EXPECT with type Protocol
declare channel BREAK with type Text

-- declare the reception logic:
when receive x on SOCK and receive EHLO on EXPECT ->
  if x == "EHLO"
  then send FROM on EXPECT
  else send "expected EHLO, got ${x}" on BREAK

when receive x on SOCK and receive FROM on EXPECT ->
  if x == "FROM"
  then send RCPT on EXPECT
  else send "expected FROM, got ${x}" on BREAK

when receive x on SOCK and receive RCPT on EXPECT ->
  if x == "RCPT"
  then ... -- continue, send whatever else we need
  else send "expected RCPT, got ${x}" on BREAK

emit EHLO on EXPECT

The outside world needs to send us on messages on the channel SOCK. If several threads send concurrently various messages then things can get jumbled. In this code we statically guarantee that reception will be in order of the protocol, assuming that there is only one sending thread and assuming that the messages aren't sent too quickly.

In join calculus one can also statically guarantee that messages are sent one by one, sequentially, or with a given parallelism - say, at most 3 in parallel. (This code does not do that.)

I wrote tutorials on join calculus many years ago.
https://chaource.dreamwidth.org/tag/join_calculus

Edited Date: 2025-11-08 21:02 (UTC)

From:

sassa_nf

> messages are not sent in parallel

How's that possible?

Or is this actually "cannot be sent [in a particular piece of software]"?

From:

chaource

A program in join calculus can guarantee statically that there is at most one message on a given channel at any time.

To do that, we need two things:

- The code outside any reception should statically send at most one message on that channel.
- The channel is only available in a single local scope. The entire code that can emit on that channel must be in that scope.
- The code that receives that message will immediately emit that message after processing.

A standard example is an "async counter".

function create_counter(init_value: Int): (Channel[Unit], Channel[Unit], Channel[Unit]) = {

declare channel COUNTER with type Int
declare channel INCR with type Unit
declare channel DECR with type Unit
declare channel READ with type Int

when receive x on COUNTER and receive () on INCR ->
  send x + 1 on COUNTER

when receive x on COUNTER and receive () on DECR ->
  send x - 1 on COUNTER

when receive x on COUNTER and receive () on READ ->
  send x on COUNTER
  send x on READ


send init_value on COUNTER

(INCR, DECR, READ)
}

This function declares a local scope in which there are four channels: COUNTER, DECR, INCR, READ.
The function returns three of those channels but keeps COUNTER encapsulated in its scope.

The code of the function sends a single message on COUNTER. When a message is received (and consumed), immediately another message is sent on that channel. So, at any time there is at most one message on that channel. Nobody can mess this up, no external code could ever change that logic.

External code that uses create_counter will look like this:

(INCR, DECR, READ) = create_counter(0)

send () on INCR

...

send () on DECR 

...

when receive x on READ -> ...

Edited Date: 2025-11-08 21:35 (UTC)

From:

sassa_nf

Yes, so if someone sends INCR, someone else sends DECR, and some other two send READ, we are going to have the standard IRIW race with undefined outcomes.

From:

chaource

That is correct. We have only guaranteed that the counter is sequentially updated. That is, we have guaranteed that there are no races for the counter: you can send INCR and DECR concurrently, in any order, as many times as you like, and eventually they will be all consumed and then the counter will have the correct value.

If we want to guarantee that you can't send DECR until the previous READ has been consumed or something like that, you need blocking operations: trying to send on DECR will block until previous operations are cleared. Join calculus as implemented in Jocaml has that feature (messages whose sending operator will block until some conditions are met, and then a value is returned).

Edited Date: 2025-11-09 09:12 (UTC)

From:

sassa_nf

OK, so we don't get it for free, and the engineer needs to find all the uses where it is not synchronised properly.

From:

chaource

We can use the blocking-send feature to make sure it's always synchronized and then we will have reproduced the single-threaded behavior where you can only send and receive synchronously.

Join calculus is an improvement over the actor model, it is a full-featured, purely functional and high-level declarative concurrency framework. In my (very limited) experience, join calculus rather helps programmers to see where race conditions will happen, and it is possible to guarantee statically that certain pathways are free of race conditions. It's a significant improvement over the actor model.

However, if we don't consider Jocaml, there is not a single industry-strengh implementation of join calculus. It's too new and too little known.

From:

sassa_nf

I don't see a compelling reason why join calculus would be better.

In my experience of proving the correctness of concurrent algorithms, the problem is not knowing which statements can "synchronize-with" which - that's pretty obvious from object field names. The problem is really in knowing which statements cannot "synchronize-with" which.

My understanding of what you described is that we can have one and only one synchronize-with edge, but this won't really simplify the problem, because the statements before and after them are going to reflect the same complexity.

For example, multiple reads in CSP will translate into one receive point with conditional like if state == X, do A else if state == Y, do B else if state == Z, do C else......

If so, this is hardly a simplification.

From:

chaource

In join calculus, everything is by default asynchronous; all messages are sent asynchronously and arrive in arbitrary order eventually with no time guarantees. Each synchronization is an explicit piece of code.

I do not have a lot of experience with CSP but in my view it is easy to use CSP for one kind of code only: a protocol where there are two threads exchanging messages and synchronizing after each message. Each thread is sequential, and each waits for the next message before proceeding.

Thread 1: wait for A. Once A is received, do work, then send B. Wait for C. Once C is received, do work, then send D.

Thread 2: send A. Wait for B. Once B is received, do work, then send C. Wait for D. Once D is received, do work, etc.

CSP is declarative for this scenario. But anything more complicated than this - any kind of multiple, parallel, asynchronous messages that could arrive in any order, messages that create more threads that can again send messages, etc. - and CSP becomes spaghetti code. Am I wrong?

Edited Date: 2025-11-10 18:50 (UTC)

From:

sassa_nf

> anything more complicated

That's a broad statement.

Sure, anything more complicated calls for efficient management of complexity. Then it really depends on the tools at our disposal. I don't see how join calculus can remove complexity. Maybe it offers some new way to express it. But I wouldn't say that there are many tasks where things are completely asynchronous and arrive in arbitrary order. That seems to sound like everything is a set - whereas we know from experience that most used primitives are key-value lookup tables (hashtables are possibly the cheapest) and lists.

Likewise I think communication is most often a series of messages, and very rarely unordered single-shot exchanges. Although mathematically the set may be more fundamental than the list.

> CSP becomes spaghetti code

I don't see how join calculus can solve that problem.

The way I see it, CSP makes program order explicit, and synchronisation order hard to work out. The way I understood your explanation, in join calculus synchronisation order is apparent, but then program order is hard to work out.

It looks more like a trade-off between two aspects of the same problem.

From:

chaource

This is a good point that I agree with.

Join calculus is declarative for problems where synchronization is most important; such as the dining philosophers problem. But sequential message protocols aren't represented declaratively; they need to be programmed via a complicated set of synchronizations.

CSP is declarative for sequential exchange protocols between a fixed number of parallel-running processes. But CSP is not good for a fully-concurrent world where the number of parallel threads is not set in advance, where a message can come at any time from anywhere and needs to start new threads that again will send new messages, etc.

From:

sassa_nf

Yes, something like that.

With a caveat that Golang, Java (Loom) and others allow to create cheap threads and channels at will, so available parallelism can be exploited. Of course, technically that's tons of threads to handle, but lexical scope and other language features make the complexity manageable.

However, there's also the curse of diminishing returns.

For example, the mathematically optimal number of threads can be 1M, but the difference between that and 10 threads can be purely academic.

I mean, we all know how to compute average queue wait time, or queue length for a given number of "servers", a distribution of time of arrival, and a distribution of time of departure. Playing with these parameters you can see that after some very modest number of "servers" the difference between that and theoretical optimum is very small.

So I take such tools really only as a way to express ideas cleanly, not really exploit concurrency better. So instead of writing an iterator with hasNext() and next(), we write plain code that communicates with the caller via a channel - basically, it is the means to avoid Continuation-passing transform of the logic; the channel is the link between cat C and cat OpC.

Threaded | Top-Level Comments Only

Профиль

Посетители