Discussion:
#table is not O(1)
Roman Tsisyk
2014-09-03 17:30:54 UTC
Permalink
Hi!

I wonder why #table (length operator for table) is not O(1).

```
MSize LJ_FASTCALL lj_tab_len(GCtab *t)
{
MSize j = (MSize)t->asize;
if (j > 1 && tvisnil(arrayslot(t, j-1))) {
MSize i = 1;
while (j - i > 1) {
MSize m = (i+j)/2;
if (tvisnil(arrayslot(t, m-1))) j = m; else i = m;
}
return i-1;
}
if (j) j--;
if (t->hmask <= 0)
return j;
return unbound_search(t, j);
}
```

How am I supposed to work with Lua after that?
--
WBR,
Roman Tsisyk <roman-***@public.gmane.org>
http://tarantool.org/ - an efficient in-memory data store and a Lua
application server
http://try.tarantool.org/ - try your Lua code **online**
Coda Highland
2014-09-03 17:42:17 UTC
Permalink
Post by Roman Tsisyk
Hi!
I wonder why #table (length operator for table) is not O(1).
```
MSize LJ_FASTCALL lj_tab_len(GCtab *t)
{
MSize j = (MSize)t->asize;
if (j > 1 && tvisnil(arrayslot(t, j-1))) {
MSize i = 1;
while (j - i > 1) {
MSize m = (i+j)/2;
if (tvisnil(arrayslot(t, m-1))) j = m; else i = m;
}
return i-1;
}
if (j) j--;
if (t->hmask <= 0)
return j;
return unbound_search(t, j);
}
```
How am I supposed to work with Lua after that?
Easy -- you're overreacting. ;) There was a jumbo thread over on lua-l
about this not too long ago
(http://lua-users.org/lists/lua-l/2014-08/msg00426.html is somewhere
in the middle of the conversation because the thread got forked).

It's O(log n) worst case, which is pretty fast to begin with, and its
initial guess is generally pretty good, which means you don't hit the
worst case very often. Furthermore, consider that the alternative
would be to keep the length updated with every table write, which
would add overhead to every modification -- and tables generally get
modified WAY more frequently than you query their length.

Consider that you often query the length right before iterating it --
you're doing a loop over n elements, with a one-time O(log n) cost at
the beginning. If you're modifying every entry in this loop, then
doing a per-write length update would make what's right now O(log n)
an O(n) operation.

/s/ Adam
Mike Pall
2014-09-03 17:42:09 UTC
Permalink
Post by Roman Tsisyk
I wonder why #table (length operator for table) is not O(1).
[...]
How am I supposed to work with Lua after that?
This is both well known and primarily a Lua language issue, not a
LuaJIT issue. Please take the time to review the Lua mailing list
archives for this question.

WARNING: Any discussion about the length operator is deliberately
off-topic here. By looking at the lua-l archives, you can easily
see there's no resolution that makes everyone happy. Everytime
this topic is raised, it causes endless and fruitless discussions.
Mainly because few participants in these discussions understand
the (suprisingly complex) design and implementation choices behind
it. There's no need to repeat this another time.

--Mike
Roman Tsisyk
2014-09-04 05:37:17 UTC
Permalink
Post by Mike Pall
Post by Roman Tsisyk
I wonder why #table (length operator for table) is not O(1).
[...]
How am I supposed to work with Lua after that?
This is both well known and primarily a Lua language issue, not a
LuaJIT issue. Please take the time to review the Lua mailing list
archives for this question.
WARNING: Any discussion about the length operator is deliberately
off-topic here.
Sorry, I've been thinking that Lua table is flexible and can be
either HASH or ARRAY.
After discovered this code I realized that Lua table is actually always HASH.

Why don't add some flag (e.g. high bit of t->asize) to indicate that
table is array and use t->asize as array size?
--
WBR,
Roman Tsisyk <roman-***@public.gmane.org>
http://tarantool.org/ - an efficient in-memory data store and a Lua
application server
http://try.tarantool.org/ - try your Lua code **online**
Coda Highland
2014-09-04 05:53:04 UTC
Permalink
Post by Roman Tsisyk
Post by Mike Pall
Post by Roman Tsisyk
I wonder why #table (length operator for table) is not O(1).
[...]
How am I supposed to work with Lua after that?
This is both well known and primarily a Lua language issue, not a
LuaJIT issue. Please take the time to review the Lua mailing list
archives for this question.
WARNING: Any discussion about the length operator is deliberately
off-topic here.
Sorry, I've been thinking that Lua table is flexible and can be
either HASH or ARRAY.
After discovered this code I realized that Lua table is actually always HASH.
Why don't add some flag (e.g. high bit of t->asize) to indicate that
table is array and use t->asize as array size?
It is how it is for a reason, and Mike asked that we not devolve into
a discussion on changing it.

/s/ Adam
Mike Pall
2014-09-05 12:41:26 UTC
Permalink
but I would like to try to extract any wisdom that you think
there might be in understanding any non-obvious complexity in the design
question.
Lua tables present a generic map-like interface. Layering an
auto-sizing array abstraction on top of it causes an impedance
mismatch. The concept of an "array length" is commonly only defined
for dense arrays. This conflicts with the need of a map to support
sparse integer keys.

The user expectation that the maximum (non-negative) integer key
defines the array length cannot be implemented efficiently.
Basically, any store might grow or shrink (*) the array length by
arbitrary steps. Updating an explicit length property would
pessimize the fast path of stores. Considering common use cases,
it's better to treat the length as implicit and pessimize the array
length operation instead.

Tightening the abstraction to give useful results only for dense
arrays by relaxing the constraint (any n where t[n] ~= nil and
t[n+1] == nil) helps to improve the average time complexity of the
array length operation to O(log N), but the resulting ambiguity
often causes confusion amongst (new) users.

The current behavior and definition of the length operator on Lua
tables is a compromise between user expectations and implementation
efficiency. The root of the problem is a leaky abstraction.

[Corollary: one should definitely separate the concepts of a map and
an array if one were to design a new computer language.]

(*) Remember that Lua defines a store of a nil value as a delete
operation. But it's both additions and deletions that cause issues.
Using an explicit delete operator wouldn't help efficiency.

--Mike
Evan Wies
2014-09-05 13:56:57 UTC
Permalink
Post by Mike Pall
[Corollary: one should definitely separate the concepts of a map
and an array if one were to design a new computer language.]
Given the various extensions to Lua included in LuaJIT, have you
considered addressing this?
For existing true array uses (where the table is never sparse), would
there be significant benefits (besides the #length operator performance
and accuracy)?

I recognize that so far these extensions have all been implemented as
libraries (albeit with deep LuaJIT integration), rather than any
enhancement to Lua syntax. Except for some potential array constructor
nicety/performance (i.e. array.new{1,2,3,"a","b","c"} not creating an
intermediate table), I don't think syntax would need to change.

I use FFI arrays extensively, but they have the limitation of being
homogeneous. I recall mentions (by agentzh) of string buffer
extensions, but know little about it; I assume they are homogeneous on
8-bit chars? Could aspects of this work dovetail with a heterogeneous
array data structure?

Regards,
Evan

P.S. If English is not your native language, you have an amazing command
of it.
Post by Mike Pall
but I would like to try to extract any wisdom that you think
there might be in understanding any non-obvious complexity in the design
question.
Lua tables present a generic map-like interface. Layering an
auto-sizing array abstraction on top of it causes an impedance
mismatch. The concept of an "array length" is commonly only defined
for dense arrays. This conflicts with the need of a map to support
sparse integer keys.
The user expectation that the maximum (non-negative) integer key
defines the array length cannot be implemented efficiently.
Basically, any store might grow or shrink (*) the array length by
arbitrary steps. Updating an explicit length property would
pessimize the fast path of stores. Considering common use cases,
it's better to treat the length as implicit and pessimize the array
length operation instead.
Tightening the abstraction to give useful results only for dense
arrays by relaxing the constraint (any n where t[n] ~= nil and
t[n+1] == nil) helps to improve the average time complexity of the
array length operation to O(log N), but the resulting ambiguity
often causes confusion amongst (new) users.
The current behavior and definition of the length operator on Lua
tables is a compromise between user expectations and implementation
efficiency. The root of the problem is a leaky abstraction.
[Corollary: one should definitely separate the concepts of a map and
an array if one were to design a new computer language.]
(*) Remember that Lua defines a store of a nil value as a delete
operation. But it's both additions and deletions that cause issues.
Using an explicit delete operator wouldn't help efficiency.
--Mike
Coda Highland
2014-09-05 14:40:15 UTC
Permalink
Post by Mike Pall
[Corollary: one should definitely separate the concepts of a map
and an array if one were to design a new computer language.]
Given the various extensions to Lua included in LuaJIT, have you considered
addressing this?
LuaJIT supports statically-typed arrays. If your use case can be
represented as an array of homogeneous cdata, then if you really want
to you can use that and manage the length yourself through
reallocations and copies.

/s/ Adam
Mike Pall
2014-09-05 16:00:17 UTC
Permalink
Post by Evan Wies
Post by Mike Pall
[Corollary: one should definitely separate the concepts of a map
and an array if one were to design a new computer language.]
Given the various extensions to Lua included in LuaJIT, have you
considered addressing this?
Well, you'd have to define the semantics of an array type first ...

Ok, let's assume you want to have JavaScript-style arrays with an
explicit length property: they have auto-grow, but not auto-shrink
semantics, i.e. stores can never shrink an array. OTOH composite
operations like pop() or explicit assignments to the length property
*do* change the array length. IMHO this is an acceptable compromise
for a generic-valued array type.

It's not hard to implement such an array type on top of a Lua table
as a user-defined type. The other option would be to have a separate
array type in the VM.

In either case, you'll get into deep trouble when exposing such a
new type to existing Lua libraries or to the Lua/C API -- this is
the real catch!

E.g. preventing the classic Lua table library operations from
messing up the length for a user-defined type will be difficult.
OTOH none of these would work out-of-the-box for a VM-defined type
either. I bet many people have written such an array type or even
released libraries for it. But none of them have been widely
adopted.

In practice, it's simply not that troublesome to use plain Lua
tables as arrays. The efficiency of the length operator doesn't
matter that much. And you can work around it, where it really
matters.
Post by Evan Wies
For existing true array uses (where the table is never sparse),
would there be significant benefits (besides the #length operator
performance and accuracy)?
A VM-defined array type would be more or less on par with a plain
Lua table wrt. load/store performance. But this is only, because
this use case of Lua tables has been optimized to death in LuaJIT.

A user-defined array type on top of Lua tables will be less
efficient than either one.

Given the significant downsides of introducing a separate array
type to the existing Lua ecosystem, I don't think it makes sense
to do so.


Or, to get back to my point: the trade-off would be quite different
for a newly designed language. The key considerations are still user
expectations and semantics, where IMHO a separate array type makes
more sense. Obviously, compiling operations for such a type is much
easier than the artistry required to make a map perform well when
used like an array.

All abstractions have a cost. Either the implementer pays for it or
the user. Or both.

--Mike
Dibyendu Majumdar
2014-09-07 16:31:56 UTC
Permalink
Post by Mike Pall
A VM-defined array type would be more or less on par with a plain
Lua table wrt. load/store performance. But this is only, because
this use case of Lua tables has been optimized to death in LuaJIT.
Surely a VM defined dense array type will be more efficient than the
existing table type? Assuming that there are special op codes for handling
such a type?

I was stepping through the Lua VM (can't step through Luajit code sadly) -
to see what happens in a simple case like this:
x = {1,2,3}
x[4] = 4

For the last operation it sees that the array part is not large enough - so
then it does also searches the hash table to see if the key exists there. I
assume having a type that acts as array or hash table adds to the
complexity of this simple operation.

Of course, agree that introducing such a type will be problematic given the
existing eco system.

Regards
Dibyendu
Coda Highland
2014-09-07 17:29:47 UTC
Permalink
On Sun, Sep 7, 2014 at 9:31 AM, Dibyendu Majumdar
Post by Dibyendu Majumdar
Post by Mike Pall
A VM-defined array type would be more or less on par with a plain
Lua table wrt. load/store performance. But this is only, because
this use case of Lua tables has been optimized to death in LuaJIT.
Surely a VM defined dense array type will be more efficient than the
existing table type? Assuming that there are special op codes for handling
such a type?
I was stepping through the Lua VM (can't step through Luajit code sadly) -
x = {1,2,3}
x[4] = 4
For the last operation it sees that the array part is not large enough - so
then it does also searches the hash table to see if the key exists there. I
assume having a type that acts as array or hash table adds to the complexity
of this simple operation.
Of course, agree that introducing such a type will be problematic given the
existing eco system.
Regards
Dibyendu
As I've mentioned before, LuaJIT does have typed dense arrays. Use
'em. They're good.

/s/ Adam
Mike Pall
2014-09-08 22:18:27 UTC
Permalink
Post by Dibyendu Majumdar
Post by Mike Pall
A VM-defined array type would be more or less on par with a plain
Lua table wrt. load/store performance. But this is only, because
this use case of Lua tables has been optimized to death in LuaJIT.
Surely a VM defined dense array type will be more efficient than the
existing table type? Assuming that there are special op codes for handling
such a type?
Not really. For numeric keys, the first thing it does is to check
the hidden array part of a Lua table. The bounds check and the
loads/stores have the same overhead as for a VM-defined (generic)
array type.
Post by Dibyendu Majumdar
I was stepping through the Lua VM (can't step through Luajit code sadly) -
x = {1,2,3}
x[4] = 4
For the last operation it sees that the array part is not large enough - so
then it does also searches the hash table to see if the key exists there.
The array part grows exponentially, so for stores to consecutive
indexes that happens only every log N operations. The extra check
in the hash part (or even a rehash) only changes a constant factor.
Post by Dibyendu Majumdar
I assume having a type that acts as array or hash table adds to the
complexity of this simple operation.
It adds to the implementation complexity, but not the asymptotic
time complexity of consecutive stores.

If you're still worried about the overhead of array growth, have a
look at the table.new() function in the LuaJIT 2.1 docs. Or use
typed FFI arrays.

--Mike

demetri
2014-09-06 02:42:51 UTC
Permalink
but I would like to try to extract any wisdom that you think
there might be in understanding any non-obvious complexity in the design
question.
Lua tables present a generic map-like interface ... <snip>
Thanks Mike.

Someone else commented on your great patience in responding to this
list and I wanted to second that. The time and effort you put into
educating
the LuaJIT community on matters big and small is a great gift to all of us.

I know that from your perspective some of these clarifications may be
trivialities, but I wanted to take a moment to highlight how much I value
these opportunities to learn from your experience.

Cheers,
Demetri
Elias Barrionovo
2014-09-03 17:42:50 UTC
Permalink
Post by Roman Tsisyk
I wonder why #table (length operator for table) is not O(1).
(...)
How am I supposed to work with Lua after that?
Not sure if trolling, but I'd guess it's not O(1) so it doesn't add an
overhead to *every* table insertion / deletion.
--
NI!

() - www.asciiribbon.org
/\ - ascii ribbon campaign against html e-mail and proprietary attachments
Yichun Zhang (agentzh)
2014-09-03 19:51:35 UTC
Permalink
Hello!
Post by Elias Barrionovo
Post by Roman Tsisyk
I wonder why #table (length operator for table) is not O(1).
This is also why lj_tab_len can usually be hot in our online on-CPU
flame graphs for LuaJIT :) Better cache and maintain the table length
yourself.
Post by Elias Barrionovo
Not sure if trolling, but I'd guess it's not O(1) so it doesn't add an
overhead to *every* table insertion / deletion.
For the same reason, the table.insert(tb, val) call also suffers from
lj_tab_len in flame graphs :) And it's always recommended to use
something like below

for i = 1, #tb do
tb[i] = newvals[i]
end

rather than

local insert = table.insert
for i = 1, #tb do
insert(tb, newvals[i])
end

The latter can also be hot on the lj_tab_len call.

Alternatively, we can just explicitly specify the insertion position
in the table.insert call to eliminate lj_tab_len.

Regards,
-agentzh
Continue reading on narkive:
Loading...