Discussion:
LuaJIT Crash When Returning Null Pointer from FFI
Jim Burnes
2014-07-15 23:10:42 UTC
Permalink
Are there any well-known issues regarding FFI calls crashing when returning
a NULL struct pointer. I 'm making an FFI call that *only* crashes when I
set the return value to NULL. The C function runs and everything works
fine when returning a struct pointer to a valid memory location.

When the function sets the return value to NULL and returns it crashes
before making it back to lua land.

I've tried making sure that Lua and C are seeing the same structure
definition. I've tried looking the issue up in the usual places, I've
tried manipulating the structure alignment etc and nothing so far has
worked. I single stepped into the return and the "bad access" crash is
inside LuaJIT assembler code at a "leave" instruction.

This is OSX 64 bit, clang.

I can provide much more information if necessary.
Alex
2014-07-15 23:56:21 UTC
Permalink
Post by Jim Burnes
I can provide much more information if necessary.
Some example code would greatly help.
--
Sincerely,
Alex Parrill
Jim Burnes
2014-07-16 14:42:30 UTC
Permalink
Okay. Here is the structure definition and the the FFI call prototype:

struct Sys_FD
{
int fd;
};

Sys_FD* sys_open_result();

--
This is the lua code that calls sys_open_result:
local fd = node9.sys_open_result()
print("sys.open(" .. path .. ") returned with value",fd)
if fd then
return ffi.gc(fd, node9.free_fd)
else
return nil
end

--
This is the C function itself:

Sys_FD*
sys_open_result()
{
hproc_t* hp = (hproc_t*) up;

// detach the node at the head of the reply queue
QUEUE* q = QUEUE_HEAD(&hp->repq);
QUEUE_REMOVE(q);

N9SysReq* scall = QUEUE_DATA(q, N9SysReq, req.node);
Sys_FD* fdes = scall->open.ret;
if (fdes != NULL) {
trace(TRACE_WARN, "sys_open: returning fdes %d @ %p",fdes->fd,
fdes);
}
else {
trace(TRACE_WARN, "sys_open: fdes is null, open failed");
}
// release the system call structure
free(scall);
return fdes;
}
Post by Jim Burnes
I can provide much more information if necessary.
Some example code would greatly help.
--
Sincerely,
Alex Parrill
Jim Burnes
2014-07-16 14:46:00 UTC
Permalink
Here's the relevant lldb trace code from return of 'fdes' to point at which
it makes an illegal memory access:

-> 360 return fdes; // ret has a pointer to the FD, needs to be
tracked by the caller

361 }

362

363

(lldb) n

Process 44382 stopped

* thread #1: tid = 0x4a7c3a, 0x000000010004260a node9`lj_vm_ffi_call + 132,
queue = 'com.apple.main-thread', stop reason = step over

frame #0: 0x000000010004260a node9`lj_vm_ffi_call + 132

node9`lj_vm_ffi_call + 132:

-> 0x10004260a: movq %rax, 0x90(%rbx)

0x100042611: movaps %xmm0, 0x10(%rbx)

0x100042615: movq %rdx, 0x98(%rbx)

0x10004261c: movaps %xmm1, 0x20(%rbx)

(lldb) n

Process 44382 stopped

* thread #1: tid = 0x4a7c3a, 0x0000000100042611 node9`lj_vm_ffi_call + 139,
queue = 'com.apple.main-thread', stop reason = instruction step over

frame #0: 0x0000000100042611 node9`lj_vm_ffi_call + 139

node9`lj_vm_ffi_call + 139:

-> 0x100042611: movaps %xmm0, 0x10(%rbx)

0x100042615: movq %rdx, 0x98(%rbx)

0x10004261c: movaps %xmm1, 0x20(%rbx)

0x100042620: movq -0x8(%rbp), %rbx

(lldb) n

Process 44382 stopped

* thread #1: tid = 0x4a7c3a, 0x0000000100042615 node9`lj_vm_ffi_call + 143,
queue = 'com.apple.main-thread', stop reason = instruction step over

frame #0: 0x0000000100042615 node9`lj_vm_ffi_call + 143

node9`lj_vm_ffi_call + 143:

-> 0x100042615: movq %rdx, 0x98(%rbx)

0x10004261c: movaps %xmm1, 0x20(%rbx)

0x100042620: movq -0x8(%rbp), %rbx

0x100042624: leave

(lldb) n

Process 44382 stopped

* thread #1: tid = 0x4a7c3a, 0x000000010004261c node9`lj_vm_ffi_call + 150,
queue = 'com.apple.main-thread', stop reason = instruction step over

frame #0: 0x000000010004261c node9`lj_vm_ffi_call + 150

node9`lj_vm_ffi_call + 150:

-> 0x10004261c: movaps %xmm1, 0x20(%rbx)

0x100042620: movq -0x8(%rbp), %rbx

0x100042624: leave

0x100042625: ret

(lldb) n

Process 44382 stopped

* thread #1: tid = 0x4a7c3a, 0x0000000100042620 node9`lj_vm_ffi_call + 154,
queue = 'com.apple.main-thread', stop reason = instruction step over

frame #0: 0x0000000100042620 node9`lj_vm_ffi_call + 154

node9`lj_vm_ffi_call + 154:

-> 0x100042620: movq -0x8(%rbp), %rbx

0x100042624: leave

0x100042625: ret

0x100042626: nop

(lldb) n

Process 44382 stopped

* thread #1: tid = 0x4a7c3a, 0x0000000100042624 node9`lj_vm_ffi_call + 158,
queue = 'com.apple.main-thread', stop reason = instruction step over

frame #0: 0x0000000100042624 node9`lj_vm_ffi_call + 158

node9`lj_vm_ffi_call + 158:

-> 0x100042624: leave

0x100042625: ret

0x100042626: nop

0x100042627: nop

(lldb) n

warning: failed to set breakpoint site at 0x100000000c6 for breakpoint
-30.1: Unable to read memory at breakpoint address.

Process 44382 stopped

* thread #1: tid = 0x4a7c3a, 0x00000001000827ea node9`lj_cconv_ct_ct +
1338, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS
(code=1, address=0x0)

frame #0: 0x00000001000827ea node9`lj_cconv_ct_ct + 1338

node9`lj_cconv_ct_ct + 1338:

-> 0x1000827ea: movl (%r14), %eax

0x1000827ed: testl $0x800000, %r12d

0x1000827f4: je 0x100082888 ; lj_cconv_ct_ct + 1496

0x1000827fa: movl %eax, %eax

(lldb)
Post by Jim Burnes
struct Sys_FD
{
int fd;
};
Sys_FD* sys_open_result();
--
local fd = node9.sys_open_result()
print("sys.open(" .. path .. ") returned with value",fd)
if fd then
return ffi.gc(fd, node9.free_fd)
else
return nil
end
--
Sys_FD*
sys_open_result()
{
hproc_t* hp = (hproc_t*) up;
// detach the node at the head of the reply queue
QUEUE* q = QUEUE_HEAD(&hp->repq);
QUEUE_REMOVE(q);
N9SysReq* scall = QUEUE_DATA(q, N9SysReq, req.node);
Sys_FD* fdes = scall->open.ret;
if (fdes != NULL) {
fdes);
}
else {
trace(TRACE_WARN, "sys_open: fdes is null, open failed");
}
// release the system call structure
free(scall);
return fdes;
}
Post by Jim Burnes
I can provide much more information if necessary.
Some example code would greatly help.
--
Sincerely,
Alex Parrill
Duncan Cross
2014-07-16 15:20:13 UTC
Permalink
Post by Jim Burnes
struct Sys_FD
{
int fd;
};
Sys_FD* sys_open_result();
--
local fd = node9.sys_open_result()
print("sys.open(" .. path .. ") returned with value",fd)
if fd then
return ffi.gc(fd, node9.free_fd)
else
return nil
end
I'm not sure if this could be your problem, since you say in your
original post it is crashing before it is even making its way back to
Lua land - but note that "if fd then ..." will ALWAYS run, because FFI
null-pointer values are considered logically "true", even though they
== nil.

-Duncan
Jim Burnes
2014-07-17 05:09:11 UTC
Permalink
Thanks. Yes. I changed that thinking that checking that it might be cause
of the crash (checking the structure field when the pointer value is nil).
Apparently it wasn't the cause of the problem and changing it hasn't fixed
anything.

(BTW: How do I actually check for a null return value then? Actually
compare it to zero?)

In any case the print statement never actually executes. It appears to be
crashing in the assembly code that interfaces back into the Lua program
(FFI code I'm guessing).

It's probably something that I'm doing, but I'm running out of ideas.
Post by Duncan Cross
Post by Jim Burnes
struct Sys_FD
{
int fd;
};
Sys_FD* sys_open_result();
--
local fd = node9.sys_open_result()
print("sys.open(" .. path .. ") returned with value",fd)
if fd then
return ffi.gc(fd, node9.free_fd)
else
return nil
end
I'm not sure if this could be your problem, since you say in your
original post it is crashing before it is even making its way back to
Lua land - but note that "if fd then ..." will ALWAYS run, because FFI
null-pointer values are considered logically "true", even though they
== nil.
-Duncan
Coda Highland
2014-07-17 05:12:24 UTC
Permalink
Post by Jim Burnes
(BTW: How do I actually check for a null return value then? Actually
compare it to zero?)
Compare it to nil.

/s/ Adam
Mike Pall
2014-07-17 10:21:29 UTC
Permalink
Post by Jim Burnes
In any case the print statement never actually executes. It appears to be
crashing in the assembly code that interfaces back into the Lua program
(FFI code I'm guessing).
Well, a *complete* self-contained example would help. Not some
snippets taken out of context. Simply by the process of reducing
it to that, one often finds the problem.

Returning a NULL pointer is fine, as well as printing it (it'll
print as: cdata<...>: NULL).

Also, build LuaJIT with debug symbols enabled, otherwise the
debugger output isn't very useful.

--Mike
Jim Burnes
2014-07-17 17:09:36 UTC
Permalink
Mike ... more than happy to. Just need to know how much you need. I'll
enable debug syms in LuaJIT, but the rest of the app is getting pretty
large so if you can be more specific I can provide most anything. Stand by
and I'll re-run with debug sums.
Post by Mike Pall
Post by Jim Burnes
In any case the print statement never actually executes. It appears to
be
Post by Jim Burnes
crashing in the assembly code that interfaces back into the Lua program
(FFI code I'm guessing).
Well, a *complete* self-contained example would help. Not some
snippets taken out of context. Simply by the process of reducing
it to that, one often finds the problem.
Returning a NULL pointer is fine, as well as printing it (it'll
print as: cdata<...>: NULL).
Also, build LuaJIT with debug symbols enabled, otherwise the
debugger output isn't very useful.
--Mike
Mike Pall
2014-07-17 17:43:42 UTC
Permalink
Post by Jim Burnes
Mike ... more than happy to. Just need to know how much you need.
A self-contained example that reproduces the problem needn't have
more than 100 lines. If you can reduce it down to that, then I'll
have a look.

But, as I said, usually in the process of reducing while trying to
preserve the problem, you'll discover the issue in your app, anyway. :-)

--Mike
Jim Burnes
2014-07-17 18:37:12 UTC
Permalink
Sure. You're of course right. I usually find that explaining a problem to
a fellow engineer solves the problem before he even gets his head around
it. I'll see if I can encapsulate such a sample.
Post by Mike Pall
Post by Jim Burnes
Mike ... more than happy to. Just need to know how much you need.
A self-contained example that reproduces the problem needn't have
more than 100 lines. If you can reduce it down to that, then I'll
have a look.
But, as I said, usually in the process of reducing while trying to
preserve the problem, you'll discover the issue in your app, anyway. :-)
--Mike
Jim Burnes
2014-07-21 22:54:38 UTC
Permalink
I'll include a large example at the end of the day if possible. After
disabling all optimization and enabling debug symbols in luajit I have
found something interesting.

I accept that I might be causing this, but there seems to be an issue in
the FFI conversion function:

lj_cf_ffi_meta___index in lib_ffi.c around line 147

lj_cf_ffi_meta___index makes a call to:

lj_cdata_index(cts, cdataV(o), o+1, &p, &qual)

which modifies the pointer p which points to the cdata to retrieve (or some
part thereof)

When my C function returns the pointer to an existing struct everything
works fine, but when I return a null pointer lj_cdata_index makes 'p' NULL
which is then used to access the non-existent structure anyway.

This null 'p' value is eventually used by lj_conv_ct_ct at around line 237
of lj_cconv.c (in my clone of luajit)

(here 'p' is the parameter 'sp' in this function)

236: if (ssize == 4) {
237: i = *(int32_t*)sp;
238: }

Obviously since sp is 0 and an invalid address it generates an exception.

I'm guessing this is something I'm doing wrong, but I just can't find it.
Post by Jim Burnes
Sure. You're of course right. I usually find that explaining a problem
to a fellow engineer solves the problem before he even gets his head around
it. I'll see if I can encapsulate such a sample.
Post by Mike Pall
Post by Jim Burnes
Mike ... more than happy to. Just need to know how much you need.
A self-contained example that reproduces the problem needn't have
more than 100 lines. If you can reduce it down to that, then I'll
have a look.
But, as I said, usually in the process of reducing while trying to
preserve the problem, you'll discover the issue in your app, anyway. :-)
--Mike
Jim Burnes
2014-07-22 01:44:25 UTC
Permalink
Slight addition:

After single stepping from where my C function returns (Sys_FD*)NULL to the
crash point:

1. FFI correctly determines that the value returned is a pointer type (of 8
bytes)
2. FFI sets the pointer value (p) to 0x0000000000000000
3. FFI then determines the ctype for the structure subfield (an integer, 4
bytes long)
4. FFI then computes the field offset for the single structure field 0
5. FFI then tries to retrieve the value of the structure subfield, which of
course doesn't exist
6. FFI eventually aborts trying to reference memory at 0x0000000000000000

For some reason FFI is ignoring the fact that the pointer is null and
continues as if nothing special is happening (i think).
Post by Jim Burnes
I'll include a large example at the end of the day if possible. After
disabling all optimization and enabling debug symbols in luajit I have
found something interesting.
I accept that I might be causing this, but there seems to be an issue in
lj_cf_ffi_meta___index in lib_ffi.c around line 147
lj_cdata_index(cts, cdataV(o), o+1, &p, &qual)
which modifies the pointer p which points to the cdata to retrieve (or
some part thereof)
When my C function returns the pointer to an existing struct everything
works fine, but when I return a null pointer lj_cdata_index makes 'p' NULL
which is then used to access the non-existent structure anyway.
This null 'p' value is eventually used by lj_conv_ct_ct at around line 237
of lj_cconv.c (in my clone of luajit)
(here 'p' is the parameter 'sp' in this function)
236: if (ssize == 4) {
237: i = *(int32_t*)sp;
238: }
Obviously since sp is 0 and an invalid address it generates an exception.
I'm guessing this is something I'm doing wrong, but I just can't find it.
Post by Jim Burnes
Sure. You're of course right. I usually find that explaining a problem
to a fellow engineer solves the problem before he even gets his head around
it. I'll see if I can encapsulate such a sample.
Post by Mike Pall
Post by Jim Burnes
Mike ... more than happy to. Just need to know how much you need.
A self-contained example that reproduces the problem needn't have
more than 100 lines. If you can reduce it down to that, then I'll
have a look.
But, as I said, usually in the process of reducing while trying to
preserve the problem, you'll discover the issue in your app, anyway. :-)
--Mike
Mike Pall
2014-07-22 07:52:06 UTC
Permalink
Post by Jim Burnes
lj_cf_ffi_meta___index in lib_ffi.c around line 147
This function is only called if you're indexing the returned
pointer yourself. For example by accessing fd.fd in a __tostring
metamethod.

--Mike
Jim Burnes
2014-07-22 19:18:14 UTC
Permalink
Post by Mike Pall
This function is only called if you're indexing the returned
pointer yourself. For example by accessing fd.fd in a __tostring
metamethod.
I am using __tostring on it. So am I prevented from using metamethods when
returning NULLs or should I used some other technique?

In any case I'll look at the code path when not using metamethods.
Post by Mike Pall
Post by Jim Burnes
lj_cf_ffi_meta___index in lib_ffi.c around line 147
This function is only called if you're indexing the returned
pointer yourself. For example by accessing fd.fd in a __tostring
metamethod.
--Mike
Javier Guerra Giraldez
2014-07-22 19:21:27 UTC
Permalink
Post by Jim Burnes
I am using __tostring on it. So am I prevented from using metamethods
when returning NULLs or should I used some other technique?
i guess you should check if the pointer is NULL before reading its fields
--
Javier
Jim Burnes
2014-07-22 19:30:34 UTC
Permalink
Post by Javier Guerra Giraldez
i guess you should check if the pointer is NULL before reading its fields
It aborts before returning to my Lua code. I think the best solution may
be to simply create a singleton Sys_FD called something like FD_NONE which
contains -1 as it's 'fd' field value. I simply return the pointer to that
instead of NULL. This way I can use metamethods and it's a simple check
for -1 when my Lua code receives it.
Post by Javier Guerra Giraldez
Post by Jim Burnes
I am using __tostring on it. So am I prevented from using metamethods
when returning NULLs or should I used some other technique?
i guess you should check if the pointer is NULL before reading its fields
--
Javier
Mike Pall
2014-07-22 19:37:11 UTC
Permalink
Post by Jim Burnes
Post by Javier Guerra Giraldez
i guess you should check if the pointer is NULL before reading its fields
It aborts before returning to my Lua code.
No, it does not. You said it runs lj_cf_ffi_meta___index. This is
a function that's ONLY invoked when dereferencing a cdata object
from Lua code, i.e. YOUR code.

You've probbably defined a __tostring metamethod with ffi.metatype
for that Sys_FD type. And that __tostring metamethod is most
likely wrong, because it doesn't check whether the pointer is NULL.
Doing that is the responsibility of your code, not of the FFI.

--Mike
Jim Burnes
2014-07-22 19:45:16 UTC
Permalink
Thanks. I got it (see my other post). I thought it was my code, but ..
sometimes the maze of twisty little passages...

Good to know that it's my code actually. Increases my trust in LuaJIT.
Post by Javier Guerra Giraldez
Post by Jim Burnes
Post by Javier Guerra Giraldez
i guess you should check if the pointer is NULL before reading its
fields
Post by Jim Burnes
It aborts before returning to my Lua code.
No, it does not. You said it runs lj_cf_ffi_meta___index. This is
a function that's ONLY invoked when dereferencing a cdata object
from Lua code, i.e. YOUR code.
You've probbably defined a __tostring metamethod with ffi.metatype
for that Sys_FD type. And that __tostring metamethod is most
likely wrong, because it doesn't check whether the pointer is NULL.
Doing that is the responsibility of your code, not of the FFI.
--Mike
Mike Pall
2014-07-22 19:23:57 UTC
Permalink
Post by Jim Burnes
Post by Mike Pall
This function is only called if you're indexing the returned
pointer yourself. For example by accessing fd.fd in a __tostring
metamethod.
I am using __tostring on it. So am I prevented from using metamethods when
returning NULLs or should I used some other technique?
No, you can use __tostring. But you better check for a NULL
pointer before dereferencing a field.

I mean, you wrote that method, right? The default __tostring
handler of the FFI doesn't dereference pointers, so it must be
your code.

--Mike
Jim Burnes
2014-07-22 19:34:26 UTC
Permalink
Post by Mike Pall
No, you can use __tostring. But you better check for a NULL
pointer before dereferencing a field.
I mean, you wrote that method, right? The default __tostring
handler of the FFI doesn't dereference pointers, so it must be
your code.
Light goes on<< Thx Mike. The deref is happening because my metamethod
is forcing it, not because it was some part of some default de-marshalling
process.

Need more caffiene. :) I'll give it a try.
Post by Mike Pall
Post by Mike Pall
This function is only called if you're indexing the returned
pointer yourself. For example by accessing fd.fd in a __tostring
metamethod.
I am using __tostring on it. So am I prevented from using metamethods
when
returning NULLs or should I used some other technique?
No, you can use __tostring. But you better check for a NULL
pointer before dereferencing a field.
I mean, you wrote that method, right? The default __tostring
handler of the FFI doesn't dereference pointers, so it must be
your code.
--Mike
Loading...