Discussion:
Workarounds for LuaJIT's 4GB limit
Nicholas Hutchinson
2014-10-15 22:28:31 UTC
Permalink
Hi all,

A (64-bit Linux) product I'm working on is running into issues relating to:
- LuaJIT's requirement that the interpreter state reside in the first 4GB of your process' address space
- its default allocator (see lj_alloc.c) getting memory from the system using mmap() with the MAP_32BIT flag, which further restricts LuaJIT to using using the first 2GB of your process' address space

I gather there are cases where we're trying to initialise a Lua interpreter state quite late in the piece, and that this fails as we've exhausted that lower 2GB portion of our address space. I'd love to hear other people's experiences and how they're working around this limitation. The suggestions I've read seem to boil down to:

- switching to a different memory allocator such as tcmalloc or jemalloc -- tcmalloc can be configured to use mmap() rather than sbrk() to request memory from the system, and jemalloc seems to always use mmap(). As I understand it, mmap() generally (always?) returns memory from >= 4GB, so the the lower portion would then be free for LuaJIT.
- customising LuaJIT's own memory allocator -- at process startup, allocate some suitably large chunk of memory from the < 4GB region and dole it out to LuaJIT in chunks.

Am I missing something simpler?

Finally, is there any chance restriction is going to be relaxed in a future version of LuaJIT?

Thanks,
Nick
Alex
2014-10-15 23:30:12 UTC
Permalink
The 4GB limit only applies to Lua objects and cdata allocated with ffi.new
or from ffi.typeof constructors. If you use malloc directly (via declaring
it with ffi.cdef then using it with ffi.C.malloc), you can use memory
outside of the limit.

On Wed, Oct 15, 2014 at 6:28 PM, Nicholas Hutchinson <
Post by Nicholas Hutchinson
Hi all,
- LuaJIT's requirement that the interpreter state reside in the first 4GB
of your process' address space
- its default allocator (see lj_alloc.c) getting memory from the system
using mmap() with the MAP_32BIT flag, which further restricts LuaJIT to
using using the first 2GB of your process' address space
I gather there are cases where we're trying to initialise a Lua
interpreter state quite late in the piece, and that this fails as we've
exhausted that lower 2GB portion of our address space. I'd love to hear
other people's experiences and how they're working around this limitation.
- switching to a different memory allocator such as tcmalloc or jemalloc
-- tcmalloc can be configured to use mmap() rather than sbrk() to request
memory from the system, and jemalloc seems to always use mmap(). As I
understand it, mmap() generally (always?) returns memory from >= 4GB, so
the the lower portion would then be free for LuaJIT.
- customising LuaJIT's own memory allocator -- at process startup,
allocate some suitably large chunk of memory from the < 4GB region and dole
it out to LuaJIT in chunks.
Am I missing something simpler?
Finally, is there any chance restriction is going to be relaxed in a
future version of LuaJIT?
Thanks,
Nick
--
Sincerely,
Alex Parrill
Alex Wilson
2014-10-16 01:25:21 UTC
Permalink
Post by Alex
The 4GB limit only applies to Lua objects and cdata allocated with
ffi.new or from ffi.typeof constructors. If you use malloc directly (via
declaring it with ffi.cdef then using it with ffi.C.malloc), you can use
memory outside of the limit.
There is another limit as well that applies to lightuserdata pointers --
I run into this a lot on Solaris and BSD variants where the stack is
allocated all the way up near the top of memory. On these platforms you
can't create a lightuserdata from something that's on the stack, or
allocated up high in the address space in general. This regularly
crashes LPEG and sometimes very recent versions of nginx-lua.

I assume this is more just due to tagged pointers using the high bits of
the address though, rather than limitations in the size of jump operands
and such, which the low 2GB limit is related to?
Post by Alex
On Wed, Oct 15, 2014 at 6:28 PM, Nicholas Hutchinson
Hi all,
- LuaJIT's requirement that the interpreter state reside in the
first 4GB of your process' address space
- its default allocator (see lj_alloc.c) getting memory from the
system using mmap() with the MAP_32BIT flag, which further restricts
LuaJIT to using using the first 2GB of your process' address space
I gather there are cases where we're trying to initialise a Lua
interpreter state quite late in the piece, and that this fails as
we've exhausted that lower 2GB portion of our address space. I'd
love to hear other people's experiences and how they're working
- switching to a different memory allocator such as tcmalloc or
jemalloc -- tcmalloc can be configured to use mmap() rather than
sbrk() to request memory from the system, and jemalloc seems to
always use mmap(). As I understand it, mmap() generally (always?)
returns memory from >= 4GB, so the the lower portion would then be
free for LuaJIT.
- customising LuaJIT's own memory allocator -- at process startup,
allocate some suitably large chunk of memory from the < 4GB region
and dole it out to LuaJIT in chunks.
Am I missing something simpler?
Finally, is there any chance restriction is going to be relaxed in a
future version of LuaJIT?
Thanks,
Nick
--
Sincerely,
Alex Parrill
Florian Weimer
2014-10-21 11:59:06 UTC
Permalink
Post by Alex Wilson
Post by Alex
The 4GB limit only applies to Lua objects and cdata allocated with
ffi.new or from ffi.typeof constructors. If you use malloc directly (via
declaring it with ffi.cdef then using it with ffi.C.malloc), you can use
memory outside of the limit.
There is another limit as well that applies to lightuserdata pointers --
I run into this a lot on Solaris and BSD variants where the stack is
allocated all the way up near the top of memory. On these platforms
you can't create a lightuserdata from something that's on the stack,
or allocated up high in the address space in general. This regularly
crashes LPEG and sometimes very recent versions of nginx-lua.
I cannot reproduce this, using:

#include <lua5.1/lua.h>
#include <lua5.1/lauxlib.h>

static int
l_tolud(lua_State *L)
{
lua_Integer a = lua_tointeger(L, 1);
lua_Integer b = lua_tointeger(L, 2);
unsigned au = a;
unsigned bu = b;
unsigned long long ap = au;
unsigned long long bp = bu;
unsigned long long combined = (ap << 32) | bp;
lua_pushlightuserdata(L, (void *) combined);
return 1;
}

int
luaopen_tolud(lua_State *L)
{
struct luaL_Reg funcs[] = {
{"tolud", l_tolud},
{NULL, NULL},
};
lua_newtable(L);
luaL_register(L, NULL, funcs);
return 1;
}

//////////////////////////////////////////////////////////////////////

local tolud = (require "tolud").tolud
local ffi = require "ffi"

local values = {
{0, 0},
{0, 1},
{1, 0},
{0x7fff, 0xffffffff},
{0x8000, 0},
}

for i = 1, #values do
local val = values[i]
local lud = tolud(val[1], val[2])
print(val[1], val[2], lud)
end

----------------------------------------------------------------------

This program prints for me:

0 0 userdata: NULL
0 1 userdata: 0x00000001
1 0 userdata: 0x0100000000
32767 4294967295 userdata: 0x7fffffffffff
luajit: bad light userdata pointer
stack traceback:
[C]: in function 'tolud'
tolud_test.lua:14: in main chunk
[C]: at 0x00405320
Post by Alex Wilson
From this, I infer that all valid userspace pointers on x86_64 are
representable as lightuserdata values. So you must be observing
another issue.
Mike Pall
2014-10-21 12:32:56 UTC
Permalink
Post by Alex Wilson
From this, I infer that all valid userspace pointers on x86_64 are
representable as lightuserdata values.
Solaris also uses the negative x64 address space in user mode (for
stacks and default mmap). I have no plans to deal with this oddity.

--Mike
Florian Weimer
2014-10-21 12:39:51 UTC
Permalink
Post by Mike Pall
Post by Alex Wilson
From this, I infer that all valid userspace pointers on x86_64 are
representable as lightuserdata values.
Solaris also uses the negative x64 address space in user mode (for
stacks and default mmap). I have no plans to deal with this oddity.
Wow. I didn't know this was even supported by the silicon.
Sean Conner
2014-10-21 15:57:53 UTC
Permalink
Post by Florian Weimer
Post by Mike Pall
Post by Alex Wilson
From this, I infer that all valid userspace pointers on x86_64 are
representable as lightuserdata values.
Solaris also uses the negative x64 address space in user mode (for
stacks and default mmap). I have no plans to deal with this oddity.
Wow. I didn't know this was even supported by the silicon.
Why not? Addresses presented to processes are virtual and as long as, for
example, address $FFC0000012345670 is mapped into the process I don't see a
problem with "negative" addresses.

-spc (Sad to hear that LuaJIT will probably never see the light of day on
Solaris ... )
Florian Weimer
2014-10-21 16:11:19 UTC
Permalink
Post by Sean Conner
Post by Florian Weimer
Post by Mike Pall
Post by Alex Wilson
From this, I infer that all valid userspace pointers on x86_64 are
representable as lightuserdata values.
Solaris also uses the negative x64 address space in user mode (for
stacks and default mmap). I have no plans to deal with this oddity.
Wow. I didn't know this was even supported by the silicon.
Why not? Addresses presented to processes are virtual and as long as, for
example, address $FFC0000012345670 is mapped into the process I don't see a
problem with "negative" addresses.
The architecture is limited to 48 bits, and you can't map stuff at
arbitrary addresses (the topmost 16 bit must all be equal). I
mistakenly assumed that the negative addresses were reserved to kernel
data structures.

Anyway, rather soon, the 48 bit limit will fall, due to NVRAM.
Sean Conner
2014-10-21 17:17:44 UTC
Permalink
Post by Florian Weimer
Post by Sean Conner
Post by Florian Weimer
Post by Mike Pall
Post by Alex Wilson
From this, I infer that all valid userspace pointers on x86_64 are
representable as lightuserdata values.
Solaris also uses the negative x64 address space in user mode (for
stacks and default mmap). I have no plans to deal with this oddity.
Wow. I didn't know this was even supported by the silicon.
Why not? Addresses presented to processes are virtual and as long as, for
example, address $FFC0000012345670 is mapped into the process I don't see a
problem with "negative" addresses.
The architecture is limited to 48 bits,
Physical. There's a difference between virtual (or logical) addresses and
physical addresses.
Post by Florian Weimer
and you can't map stuff at
arbitrary addresses (the topmost 16 bit must all be equal).
Is that a hardware limitation, or an OS limitation?

-spc
Javier Guerra Giraldez
2014-10-21 19:35:43 UTC
Permalink
Post by Sean Conner
Post by Florian Weimer
The architecture is limited to 48 bits,
Physical. There's a difference between virtual (or logical) addresses and
physical addresses.
originally the logical/physical limits were 48/40 bits, but at some
point the physical limit was raised to 48 without changing the logical
one.

i really don't know about the details, but i've read somewhere that
the current IOMMU table imposes another limit at 52 bits. not sure if
it's only physical or if it limits logical addresses too.
--
Javier
René Rebe
2014-10-22 10:31:30 UTC
Permalink
Post by Javier Guerra Giraldez
Post by Sean Conner
Post by Florian Weimer
The architecture is limited to 48 bits,
Physical. There's a difference between virtual (or logical) addresses and
physical addresses.
originally the logical/physical limits were 48/40 bits, but at some
point the physical limit was raised to 48 without changing the logical
one.
i really don't know about the details, but i've read somewhere that
the current IOMMU table imposes another limit at 52 bits. not sure if
it's only physical or if it limits logical addresses too.
model name : AMD Opteron(TM) Processor 6274
address sizes : 48 bits physical, 48 bits virtual

model name : Dual-Core AMD Opteron(tm) Processor 1218
address sizes : 40 bits physical, 48 bits virtual

model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
address sizes : 36 bits physical, 48 bits virtual

model name : Intel(R) Atom(TM) CPU 330 @ 1.60GHz
address sizes : 32 bits physical, 48 bits virtual
Post by Javier Guerra Giraldez
--
Javier
René
--
ExactCODE GmbH, Lietzenburger Str. 42, DE-10117 Berlin
http://exactcode.com | http://exactscan.com | http://ocrkit.com | http://t2-project.org | http://rene.rebe.de
Theo Schlossnagle
2014-10-21 16:49:26 UTC
Permalink
LuaJIT runs fine on Solaris and even better on Illumos; that is our primary
platform. The limitations being outlined are inconveniences and have caused
us very little headache.
Post by Sean Conner
Post by Florian Weimer
Post by Mike Pall
Post by Alex Wilson
From this, I infer that all valid userspace pointers on x86_64 are
representable as lightuserdata values.
Solaris also uses the negative x64 address space in user mode (for
stacks and default mmap). I have no plans to deal with this oddity.
Wow. I didn't know this was even supported by the silicon.
Why not? Addresses presented to processes are virtual and as long as, for
example, address $FFC0000012345670 is mapped into the process I don't see a
problem with "negative" addresses.
-spc (Sad to hear that LuaJIT will probably never see the light of day on
Solaris ... )
--
Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
Sean Conner
2014-10-21 17:23:26 UTC
Permalink
Post by Theo Schlossnagle
LuaJIT runs fine on Solaris and even better on Illumos; that is our primary
platform. The limitations being outlined are inconveniences and have caused
us very little headache.
Ah, perhaps I should clarify---at work the programs I do run on SPARC
based Solaris boxes (64-bit).

-spc
Theo Schlossnagle
2014-10-21 12:49:32 UTC
Permalink
Post by Mike Pall
Post by Alex Wilson
From this, I infer that all valid userspace pointers on x86_64 are
representable as lightuserdata values.
Solaris also uses the negative x64 address space in user mode (for
stacks and default mmap). I have no plans to deal with this oddity.
Womp womp. Sad panda. This hasn't been a show stopper for us, but it is
pretty annoying.

I'll restate my idea for "solving" this 2/4GB issue. Currently, when you
create a new lua state it expects to be able to manage its own memory in
0x00000000-0x100000000 (or less). It would solve most (if not all) of or
problems if, when creating a new lua state, we could specify a base address
so that I could mmap into (e.g. 0xfffe00000000 - 0xffff00000000) which will
never collide with anything in my app...

In our particular app, we have many threads that each run their own
isolated lua states. This would allow us to have each of them
self-contained in their own virtual memory segment.

Best regards,

Theo
--
Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
Florian Weimer
2014-10-21 13:26:26 UTC
Permalink
Post by Theo Schlossnagle
Post by Mike Pall
Post by Alex Wilson
From this, I infer that all valid userspace pointers on x86_64 are
representable as lightuserdata values.
Solaris also uses the negative x64 address space in user mode (for
stacks and default mmap). I have no plans to deal with this oddity.
Womp womp. Sad panda. This hasn't been a show stopper for us, but it is
pretty annoying.
Have you tried using a different malloc, and running your application
from a different thread created with pthread_create? Maybe this way,
you can get addresses from a range more useful to LuaJIT.
Post by Theo Schlossnagle
I'll restate my idea for "solving" this 2/4GB issue. Currently, when you
create a new lua state it expects to be able to manage its own memory in
0x00000000-0x100000000 (or less). It would solve most (if not all) of or
problems if, when creating a new lua state, we could specify a base address
so that I could mmap into (e.g. 0xfffe00000000 - 0xffff00000000) which will
never collide with anything in my app...
The base address isn't exactly trivial to add. Scaling all pointers
by 8 could extend the usable range to 32 GB and one wouldn't need to
store the base offset in some register, but at present, LuaJIT uses
the lower 3 bits of pointers (or so I have read some time ago).
Nicholas Hutchinson
2014-10-16 07:21:08 UTC
Permalink
Our problem is that we can't even create a new interpreter state -- luaL_newstate() fails.
The 4GB limit only applies to Lua objects and cdata allocated with ffi.new or from ffi.typeof constructors. If you use malloc directly (via declaring it with ffi.cdef then using it with ffi.C.malloc), you can use memory outside of the limit.
Hi all,
- LuaJIT's requirement that the interpreter state reside in the first 4GB of your process' address space
- its default allocator (see lj_alloc.c) getting memory from the system using mmap() with the MAP_32BIT flag, which further restricts LuaJIT to using using the first 2GB of your process' address space
- switching to a different memory allocator such as tcmalloc or jemalloc -- tcmalloc can be configured to use mmap() rather than sbrk() to request memory from the system, and jemalloc seems to always use mmap(). As I understand it, mmap() generally (always?) returns memory from >= 4GB, so the the lower portion would then be free for LuaJIT.
- customising LuaJIT's own memory allocator -- at process startup, allocate some suitably large chunk of memory from the < 4GB region and dole it out to LuaJIT in chunks.
Am I missing something simpler?
Finally, is there any chance restriction is going to be relaxed in a future version of LuaJIT?
Thanks,
Nick
--
Sincerely,
Alex Parrill
Karel Tuma
2014-10-16 13:17:10 UTC
Permalink
Post by Nicholas Hutchinson
Our problem is that we can't even create a new interpreter state -- luaL_newstate() fails.
Can you provide strace?

Also, if you're on OSX, don't forget:
-pagezero_size 10000 -image_base 100000000 ( http://luajit.org/install.html#embed )
Loading...