Discussion:
Crash in JIT'd code on Solaris
Alex Wilson
2014-08-05 13:32:49 UTC
Permalink
I'm trying to use LuaJIT to run some code on Solaris, and it's consistently crashing with a SIGSEGV.
It's built in 64-bit mode, which seems to be a bit touch and go sometimes, but...

http://hastebin.com/eratovebiw.txt
is the output from -jdump, followed by the output from mdb when it crashes (with SIGSEGV)

http://hastebin.com/egaromudom.lua
is the code it was generating a trace from

It seems to generate a movl into 0xfffffffffefde4a0, which is a pretty wild-looking pointer, way above any of the mapped regions in memory. It's right at the beginning of the trace, too -- so would it be lining up with the beginning of the IR too? So that movl is being generated from the FLOAD on func.env?

Any ideas why this might be happening?

Thanks!
Alex Wilson
2014-08-05 13:41:39 UTC
Permalink
Sorry, I should clarify:

This is with LuaJIT 2.0.3, from the .tar.gz on the website. It's running embedded inside nginx, as seen in the mappings output. LPEG 0.12 is loaded in memory as well.

If I turn JIT off there is no crash and everything behaves fine.


-----Original message-----
Sent: Tuesday 5th August 2014 23:34
Subject: Crash in JIT'd code on Solaris
I'm trying to use LuaJIT to run some code on Solaris, and it's consistently crashing with a SIGSEGV.
It's built in 64-bit mode, which seems to be a bit touch and go sometimes, but...
http://hastebin.com/eratovebiw.txt
is the output from -jdump, followed by the output from mdb when it crashes (with SIGSEGV)
http://hastebin.com/egaromudom.lua
is the code it was generating a trace from
It seems to generate a movl into 0xfffffffffefde4a0, which is a pretty wild-looking pointer, way above any of the mapped regions in memory. It's right at the beginning of the trace, too -- so would it be lining up with the beginning of the IR too? So that movl is being generated from the FLOAD on func.env?
Any ideas why this might be happening?
Thanks!
Mike Pall
2014-08-05 13:45:39 UTC
Permalink
Post by Alex Wilson
It seems to generate a movl into 0xfffffffffefde4a0, which is a
pretty wild-looking pointer, way above any of the mapped regions
in memory. It's right at the beginning of the trace, too -- so
would it be lining up with the beginning of the IR too?
This is the store of the trace number in G->vmstate. Which ought
to be an address in the lowest 2GB of memory (and not 4GB).
Post by Alex Wilson
Any ideas why this might be happening?
The memory allocator is apparently returning addresses from the
wrong address range. Actually, G is the first allocation.

Since I don't have Solaris, I can't debug this. You might want to
look at lj_alloc.c near #ifdef __sun__. Check what pointers are
returned by mmap() any why these don't get rejected.

--Mike
Theo Schlossnagle
2014-08-05 17:59:51 UTC
Permalink
What version of Solaris?

Solaris 10 doesn't have a MAP32BIT option to mmap(), Illumos does. Solaris
11, I don't have any insight into its VM has been entirely overhauled.
Post by Mike Pall
Post by Alex Wilson
It seems to generate a movl into 0xfffffffffefde4a0, which is a
pretty wild-looking pointer, way above any of the mapped regions
in memory. It's right at the beginning of the trace, too -- so
would it be lining up with the beginning of the IR too?
This is the store of the trace number in G->vmstate. Which ought
to be an address in the lowest 2GB of memory (and not 4GB).
Post by Alex Wilson
Any ideas why this might be happening?
The memory allocator is apparently returning addresses from the
wrong address range. Actually, G is the first allocation.
Since I don't have Solaris, I can't debug this. You might want to
look at lj_alloc.c near #ifdef __sun__. Check what pointers are
returned by mmap() any why these don't get rejected.
--Mike
--
Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
Alex Wilson
2014-08-05 23:04:05 UTC
Permalink
This is on Illumos, and it has MAP_32BIT (I added a #warning in there to verify that it's definitely finding and using the macro, too)

To be specific it's on Joyent's illumos-joyent (SmartOS), a release from around April.



-----Original message-----
From: Theo Schlossnagle<jesus-EuoJsN+J0o7QT0dZR+***@public.gmane.org>
Sent: Wednesday 6th August 2014 4:01
To: luajit-***@public.gmane.org
Subject: Re: Crash in JIT'd code on Solaris

What version of Solaris?

Solaris 10 doesnt have a MAP32BIT option to mmap(), Illumos does.  Solaris 11, I dont have any insight into its VM has been entirely overhauled.
Post by Alex Wilson
It seems to generate a movl into 0xfffffffffefde4a0, which is a
pretty wild-looking pointer, way above any of the mapped regions
in memory. Its right at the beginning of the trace, too -- so
would it be lining up with the beginning of the IR too?
This is the store of the trace number in G->vmstate. Which ought

to be an address in the lowest 2GB of memory (and not 4GB).
Post by Alex Wilson
Any ideas why this might be happening?
The memory allocator is apparently returning addresses from the

wrong address range. Actually, G is the first allocation.

Since I dont have Solaris, I cant debug this. You might want to

look at lj_alloc.c near #ifdef __sun__. Check what pointers are

returned by mmap() any why these dont get rejected.

--Mike

--

Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle <http://omniti.com/is/theo-schlossnagle>
Mike Pall
2014-08-05 23:13:14 UTC
Permalink
Post by Alex Wilson
This is on Illumos, and it has MAP_32BIT (I added a #warning in there to verify that it's definitely finding and using the macro, too)
To be specific it's on Joyent's illumos-joyent (SmartOS), a release from around April.
Oh ... if it uses the same name, then it better behave the same as
Linux. Which is effectively MAP_**31**BIT, i.e. the lowest 2GB.

The interpreter works fine with the lowest 4GB, but the JIT
compiler requires all internal memory to be in the lowest 2GB.

--Mike
Claire Lewis
2014-08-06 01:40:20 UTC
Permalink
Post by Mike Pall
Oh ... if it uses the same name, then it better behave the same as
Linux. Which is effectively MAP_**31**BIT, i.e. the lowest 2GB.
The interpreter works fine with the lowest 4GB, but the JIT
compiler requires all internal memory to be in the lowest 2GB.
Is this true on all (64 bit) platforms, Mike?

Thanks,
- Claire.
Mike Pall
2014-08-06 01:50:41 UTC
Permalink
Post by Claire Lewis
Post by Mike Pall
Oh ... if it uses the same name, then it better behave the same as
Linux. Which is effectively MAP_**31**BIT, i.e. the lowest 2GB.
The interpreter works fine with the lowest 4GB, but the JIT
compiler requires all internal memory to be in the lowest 2GB.
Is this true on all (64 bit) platforms, Mike?
That's only true for the x64 port. But since that's the only
64 bit platform right now ...

[All 32 bit platforms can potentially use the full 4GB address range.
But many operating systems or CPUs don't give you that much virtual
address range in 32 bit userspace. AFAIK only a Linux/x64 kernel
gives you nearly the full 4GB for 32 bit userspace. ARM (32 bit) is
limited to 2GB. Not sure about PPC32/MIPS32.]

--Mike
Alex Wilson
2014-08-05 23:17:36 UTC
Permalink
-----Original message-----
Sent: Tuesday 5th August 2014 23:47
Subject: Re: Crash in JIT'd code on Solaris
Post by Alex Wilson
Any ideas why this might be happening?
The memory allocator is apparently returning addresses from the
wrong address range. Actually, G is the first allocation.
Since I don't have Solaris, I can't debug this. You might want to
look at lj_alloc.c near #ifdef __sun__. Check what pointers are
returned by mmap() any why these don't get rejected.
Ok, I first tried to dtrace on the return values of the mmap syscall, and didn't see anything unusual (all returning fine, with values that look in the right range). So I also tried rebuilding everything with an fprintf to stderr inside lj_alloc.c right after mmap returns.

All I see over and over again is:
mmap returned fefbd000
mmap returned fef9c000
mmap returned fef7b000
2014/08/05 23:04:35 [alert] 81395#0: worker process 81964 exited on signal 11 (core dumped)

So MAP_32BIT mmap is returning addresses in the low 32BIT, _just_. But these are not in the low 2GB? Instead they're right up the top of the 32-bit address space, it seems.

I applied this quick hack patch:


--- LuaJIT-2.0.3/src/lj_alloc.c 2014-03-12 12:10:00.000000000 +0000
+++ LuaJIT-2.0.3.new/src/lj_alloc.c 2014-08-05 23:16:29.363058246 +0000
@@ -177,7 +177,7 @@
#if LJ_64
/* 64 bit mode needs special support for allocating memory in the lower 2GB. */

-#if defined(MAP_32BIT)
+#if defined(MAP_32BIT) && !defined(__sun__)

/* Actually this only gives us max. 1GB in current Linux kernels. */
static LJ_AINLINE void *CALL_MMAP(size_t size)
@@ -224,7 +224,11 @@
}
#endif
for (;;) {
+#if defined(MAP_32BIT)
+ void *p = mmap((void *)alloc_hint, size, MMAP_PROT, MAP_32BIT|MMAP_FLAGS, -1, 0);
+#else
void *p = mmap((void *)alloc_hint, size, MMAP_PROT, MMAP_FLAGS, -1, 0);
+#endif
if ((uintptr_t)p >= MMAP_REGION_START &&
(uintptr_t)p + size < MMAP_REGION_END) {
alloc_hint = (uintptr_t)p + size;


and now everything seems to be working fine... not sure if that's the best way to handle it, but it's a way...
Theo Schlossnagle
2014-08-06 20:06:33 UTC
Permalink
This patches I wrote to make the alloc_hint work in Illumos are upstream
and this stuff works for us on Illumos (OmniOS) with the 2.0.3 code. Two
patches were put back: (1) support MAP_32BIT and (2) respect hints when
they can't be precisely satisfied...

If you are seeing behavior where it works until it suddenly returns
something very high in the 64bit space, then that second patch isn't in
there.

Also, there is a bug in the MAP_32BIT support where an alloc_hint == NULL
returns unexpected (not in 32bit range) addresses. Basicallym alloc_hint
here can't be less than a single page size. I believe that hack was
already put back into LuaJIT as well.

In summary, if you are using latest LuaJIT on latest Illumos, things should
be good. We use this extensively.
Post by Alex Wilson
-----Original message-----
Sent: Tuesday 5th August 2014 23:47
Subject: Re: Crash in JIT'd code on Solaris
Post by Alex Wilson
Any ideas why this might be happening?
The memory allocator is apparently returning addresses from the
wrong address range. Actually, G is the first allocation.
Since I don't have Solaris, I can't debug this. You might want to
look at lj_alloc.c near #ifdef __sun__. Check what pointers are
returned by mmap() any why these don't get rejected.
Ok, I first tried to dtrace on the return values of the mmap syscall, and
didn't see anything unusual (all returning fine, with values that look in
the right range). So I also tried rebuilding everything with an fprintf to
stderr inside lj_alloc.c right after mmap returns.
mmap returned fefbd000
mmap returned fef9c000
mmap returned fef7b000
2014/08/05 23:04:35 [alert] 81395#0: worker process 81964 exited on signal 11 (core dumped)
So MAP_32BIT mmap is returning addresses in the low 32BIT, _just_. But
these are not in the low 2GB? Instead they're right up the top of the
32-bit address space, it seems.
--- LuaJIT-2.0.3/src/lj_alloc.c 2014-03-12 12:10:00.000000000 +0000
+++ LuaJIT-2.0.3.new/src/lj_alloc.c 2014-08-05 23:16:29.363058246 +0000
@@ -177,7 +177,7 @@
#if LJ_64
/* 64 bit mode needs special support for allocating memory in the lower 2GB. */
-#if defined(MAP_32BIT)
+#if defined(MAP_32BIT) && !defined(__sun__)
/* Actually this only gives us max. 1GB in current Linux kernels. */
static LJ_AINLINE void *CALL_MMAP(size_t size)
@@ -224,7 +224,11 @@
}
#endif
for (;;) {
+#if defined(MAP_32BIT)
+ void *p = mmap((void *)alloc_hint, size, MMAP_PROT,
MAP_32BIT|MMAP_FLAGS, -1, 0);
+#else
void *p = mmap((void *)alloc_hint, size, MMAP_PROT, MMAP_FLAGS, -1, 0);
+#endif
if ((uintptr_t)p >= MMAP_REGION_START &&
(uintptr_t)p + size < MMAP_REGION_END) {
alloc_hint = (uintptr_t)p + size;
and now everything seems to be working fine... not sure if that's the best
way to handle it, but it's a way...
--
Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
Alex Wilson
2014-08-07 01:45:59 UTC
Permalink
This patches I wrote to make the alloc_hint work in Illumos are upstream and this stuff works for us on Illumos (OmniOS) with the 2.0.3 code. Two patches were put back: (1) support MAP_32BIT and (2) respect hints when they can't be precisely satisfied...
If you are seeing behavior where it works until it suddenly returns something very high in the 64bit space, then that second patch isn't in there.
If you meant "high in the 32-bit space" then yes, this is exactly what I'm seeing.
Also, there is a bug in the MAP_32BIT support where an alloc_hint == NULL returns unexpected (not in 32bit range) addresses. Basicallym alloc_hint here can't be less than a single page size. I believe that hack was already put back into LuaJIT as well.
That's an interesting one. If I download the LuaJIT 2.0.3 source code from luajit.org and look in src/lj_alloc.c, at line 186, I can plainly see that if MAP_32BIT is defined, LuaJIT will always provide NULL as the alloc_hint to mmap. Is this fix only in LuaJIT master?
Theo Schlossnagle
2014-08-09 02:31:55 UTC
Permalink
Post by Theo Schlossnagle
Post by Theo Schlossnagle
This patches I wrote to make the alloc_hint work in Illumos are upstream
and this stuff works for us on Illumos (OmniOS) with the 2.0.3 code. Two
patches were put back: (1) support MAP_32BIT and (2) respect hints when
they can't be precisely satisfied...
Sorry if I'm missing the obvious, but would you happen to have the commit
sha of the second patch in question here? I've done git log --grep=mmap and
(3785 Implement MAP_32BIT flag to mmap()), which is definitely in the build
that I'm running on (it's built from illumos-joyent at 032b223).
https://www.illumos.org/issues/3786

Looks like it went back into OmniOS, but not illumos... the illumos-omnios
git hashes are two:

b1f45cd047f0767559cc3e7acbbe5d8e36dd04f5
2e46fc2faec896191234c73a7175984cf051796e

I'm not sure if this stuff is in Joyent's branches.
Post by Theo Schlossnagle
Post by Theo Schlossnagle
If you are seeing behavior where it works until it suddenly returns
something very high in the 64bit space, then that second patch isn't in
there.
If you meant "high in the 32-bit space" then yes, this is exactly what I'm seeing.
Post by Theo Schlossnagle
Also, there is a bug in the MAP_32BIT support where an alloc_hint ==
NULL returns unexpected (not in 32bit range) addresses. Basicallym
alloc_hint here can't be less than a single page size. I believe that hack
was already put back into LuaJIT as well.
That's an interesting one. If I download the LuaJIT 2.0.3 source code from
luajit.org and look in src/lj_alloc.c, at line 186, I can plainly see
that if MAP_32BIT is defined, LuaJIT will always provide NULL as the
alloc_hint to mmap. Is this fix only in LuaJIT master?
Hrmm.. it might only be in our fork? I thought I emailed Mike about this,
but I can't seem to find a record of it now. This is what our internal
putback has:

commit 66c482fcf8ec28890e49bf3ffd286d1d09a6350d
Author: Theo Schlossnagle <jesus-EuoJsN+J0o7QT0dZR+***@public.gmane.org>
Date: Tue May 6 12:42:23 2014 -0400

Illumos mmap behaves oddly with 0x0 and MAP_32BIT.
Bump up the requested address above 1 page and things work as expected.

diff --git a/src/LuaJIT/src/lj_alloc.c b/src/LuaJIT/src/lj_alloc.c
index f856a7a..cd8dfc1 100644
--- a/src/LuaJIT/src/lj_alloc.c
+++ b/src/LuaJIT/src/lj_alloc.c
@@ -183,7 +183,12 @@ static LJ_AINLINE int CALL_MUNMAP(void *ptr, size_t
size)
static LJ_AINLINE void *CALL_MMAP(size_t size)
{
int olderr = errno;
+#if defined(__sun__)
+ /* Illumos MAP_32BIT implementation breaks with NULL request address */
+ void *ptr = mmap((void *)0x1000, size, MMAP_PROT, MAP_32BIT|MMAP_FLAGS,
-1, 0);
+#else
void *ptr = mmap(NULL, size, MMAP_PROT, MAP_32BIT|MMAP_FLAGS, -1, 0);
+#endif
errno = olderr;
return ptr;
}
--
Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
Mike Pall
2014-08-11 11:05:47 UTC
Permalink
Post by Theo Schlossnagle
Post by Alex Wilson
That's an interesting one. If I download the LuaJIT 2.0.3 source code from
luajit.org and look in src/lj_alloc.c, at line 186, I can plainly see
that if MAP_32BIT is defined, LuaJIT will always provide NULL as the
alloc_hint to mmap. Is this fix only in LuaJIT master?
Hrmm.. it might only be in our fork? I thought I emailed Mike about this,
but I can't seem to find a record of it now.
2.0.3 was released on March 12, 2014. The change landed in the git
repo on May 28.

[Ok, so the initial allocations are in the 0-2GB range with the
address hint. But I guess it'll eventually give out addresses in
the 2GB-4GB range, which will break JIT-compiled code.]

--Mike
Theo Schlossnagle
2014-08-11 23:40:20 UTC
Permalink
Post by Mike Pall
Post by Theo Schlossnagle
Post by Alex Wilson
That's an interesting one. If I download the LuaJIT 2.0.3 source code
from
Post by Theo Schlossnagle
Post by Alex Wilson
luajit.org and look in src/lj_alloc.c, at line 186, I can plainly see
that if MAP_32BIT is defined, LuaJIT will always provide NULL as the
alloc_hint to mmap. Is this fix only in LuaJIT master?
Hrmm.. it might only be in our fork? I thought I emailed Mike about
this,
Post by Theo Schlossnagle
but I can't seem to find a record of it now.
2.0.3 was released on March 12, 2014. The change landed in the git
repo on May 28.
[Ok, so the initial allocations are in the 0-2GB range with the
address hint. But I guess it'll eventually give out addresses in
the 2GB-4GB range, which will break JIT-compiled code.]
Correct. I see no way around this now... if you're out of memory in the
required address space, you're out of memory.

The reason that the limitation is an issue for us... we have LuaJIT states
(say 128 spread across 64 threads). We manually drive garbage collection
on each. This allows us to drive small computational jobs under each state
as needed and carefully control the GC operation. With 128 of these (over
threads in the same process) they all compete for the same 0-2GB space.
One though about working around that would be allowing a LuaJIT state to
specify a base 64bit address space that get ORd with the addresses
internally. Seems to me this wouldn't change any of the design or
assumptions. We could use that base for the mmap hint... then each
separate state would have its own 2GB greatly relieving the problem.
--
Theo Schlossnagle

http://omniti.com/is/theo-schlossnagle
Continue reading on narkive:
Loading...