vak | Массивы в Би

Пару лет назад один крутой чувак сделал компилятор языка Би для архитектуры x86_64 под Линуксом. Генерится ассемблерный код для GNU ассемблера. Однако в компиляторе обнаружились ошибки. Я взялся довести его до ума, пофиксил кое-что. Неожиданно наткнулся на серьёзную проблему с массивами. Товарищ реализовал массивы как в Си. Оказалось, что так оно не работает. Представление массивов в Би отличается. В частности, Би разрешает присваивание значения переменной массива. В Си такое невозможно: имя массива является константой.

Я спросил у Грока, и он поначалу не врубался. Но когда я показал ему, что Би разрешает код наподобие "auto a[3]; a = 123;", он понял и выдал подробное объяснение. Приведу его целиком.

You’re absolutely right, and I stand corrected—my earlier statement about B preventing assignment to an array name like auto a[3]; a = 123; was inaccurate. Let’s clarify this based on how B actually works, drawing from its design and implementation as documented in historical sources like the B Reference Manual and surviving code.
B’s Behavior with Array Assignment
In B, when you declare:
    auto a[3];
a is allocated as a local variable on the stack, and the [3] specifies that 3 additional words of memory are reserved immediately following a. However, unlike C, where an array name is a non-assignable symbol tied to a fixed memory block, B treats a as a regular variable—a word that initially holds the address of the first element of the 3-word block. Because B is typeless and lacks the strict type enforcement of C, a is not a "constant pointer" or an immutable array name; it’s just a variable that can be reassigned.

So, this is perfectly legal in B:
    auto a[3];
    a = 123;
After this, a no longer points to the original 3-word block—it now contains the value 123. The 3 words allocated for the array are still there on the stack, but a no longer references them unless you explicitly reset it (e.g., with something like a = &a + 1, though B’s syntax and stack behavior make this tricky).
Why This Works in B
Typeless Nature:
In B, all variables are words (18 bits on PDP-7, 32 bits on PDP-11), and there’s no distinction between integers, pointers, or array names at the language level. a is just a word, and you can store anything in it—address or not.
Array Declaration Mechanics:
When auto a[3]; is compiled, B’s compiler (like the one Thompson wrote for the PDP-7 or PDP-11) allocates 4 words total on the stack: 1 for a itself (holding the address of the first element) and 3 for the array elements. Initially, a is set to the address of a[0] (i.e., &a + 1 in a conceptual sense), but nothing in B’s semantics prevents overwriting that value.
Contrast this with C, where int a[3]; makes a a compile-time symbol tied to the array’s base address, not a variable that can be assigned to.
No Lvalue Protection:
B doesn’t enforce immutability for array names. In C, a = 123; fails because a isn’t an lvalue in that context—it’s an array type, not a modifiable object. B has no such restriction, so a behaves like any other variable.
Example and Implications
Here’s what happens step-by-step:
    auto a[3];
    a[0] = 10;  /* Sets first element to 10; a holds address of a[0] */
    a = 123;    /* Overwrites a with 123 */
    a[0] = 20;  /* Undefined behavior: treats 123 as an address */
Initially, a points to the stack-allocated block, and a[0] = 10 writes 10 to that location.
After a = 123, a now contains 123, which could be interpreted as an address if used as a pointer.
a[0] = 20 would attempt to write 20 to memory location 123, likely causing a crash or memory corruption on real hardware, since 123 isn’t a valid stack address.
This flexibility (or lack of safety) reflects B’s minimalist design—it was built for programmers who understood the machine’s memory layout and didn’t need hand-holding.
Comparison to C Again
Your earlier analogy, int buf[3], *a = buf; in C, still holds as a way to mimic B’s behavior, but with a twist:
In C:
    int buf[3];
    int *a = buf;
    a[0] = 10;  /* Works: modifies buf[0] */
    a = 123;    /* Works: a now points to address 123 */
Here, a is explicitly a pointer, separate from buf, and can be reassigned.
In B:
    auto a[3];
    a[0] = 10;  /* Works: modifies first element */
    a = 123;    /* Works: a now holds 123 */
The difference is that B doesn’t distinguish between buf and a—there’s no separate array object; a is both the "pointer" and the entry point to the array, and it’s mutable.
C’s int a[3]; prevents a = 123; to enforce array integrity, a refinement over B’s free-for-all approach.
Historical Context
This behavior aligns with B’s origins on the PDP-7 and PDP-11, where memory was scarce, and the language trusted programmers to manage it. The B Reference Manual (circa 1972) confirms that array names are variables, not immutable symbols, and examples from early Unix code in B (before the C rewrite) show similar liberties with memory manipulation.
Conclusion
You’re correct: B allows a = 123; after auto a[3];, unlike C’s array declarations. This makes B’s "arrays" more like pointers with pre-allocated space than C’s distinct array types. It’s a subtle but critical difference, highlighting B’s raw, low-level nature versus C’s added structure. Thanks for catching that—it’s a great example of how B’s simplicity shaped its quirks!

Flat | Top-Level Comments Only

From:

chaource

https://www.t3x.org/

You might enjoy this too!

vak

Thank you! A nice collection of books.

spamsink

Хорошо объясняет. Много ему всего скормили, видать.

Редкий сапиенс умеет так хорошо объяснять. Хотя сапиенсам тоже дофига скармливают за школьные и университетские годы.

ymz5

Виходить, на наших очах відбувається революція у програмобудуванні, а також, можливо, у викладанні його. :)

Так і є!

Пятнадцать байтов на стек от конца

Массивы в Би

Массивы в Би

B’s Behavior with Array Assignment

Why This Works in B

Example and Implications

Comparison to C Again

Historical Context

Conclusion

no subject

no subject

no subject

no subject

no subject

no subject

Профиль

Метки

Посетители