Skip to content

libnative: main task has bad thread local data on windows #13259

Closed
@klutzy

Description

@klutzy
Contributor

More investigation from #13073.

extern crate native;
extern crate green;
extern crate rustuv;

use std::libc::{c_void, LPWSTR, LPVOID, DWORD};

extern "system" {
    fn MessageBoxA(hWnd: u32, lpText: *u8, lpCaption: *u8, uType: u32) -> i32;

    fn FormatMessageW(flags: DWORD,
                      lpSrc: LPVOID,
                      msgId: DWORD,
                      langId: DWORD,
                      buf: LPWSTR,
                      nsize: DWORD,
                      args: *c_void)
                      -> DWORD;

    fn GetLastError() -> u32;
}

fn test() {
    let mut buf: [u16, ..50] = [0, ..50];
    let ret = unsafe {
        FormatMessageW(0x1000, 0 as *mut c_void, 1, 0x400,
                       buf.as_mut_ptr(), buf.len() as u32, 0 as *c_void)
    };
    if ret == 0 {
        let err = unsafe { GetLastError() };
        println!("err: {:?}", err);
    }
    let s = std::str::from_utf16(buf);
    println!("{:?}", s);

    //unsafe {
    //    MessageBoxA(0, "ABC".as_ptr(), "ABC".as_ptr(), 0);
    //}
}

fn main() {
    if cfg!(spawn) {
        spawn(proc() {
            test();
        });
    } else {
        test();
    }
}

#[start]
pub fn start(argc: int, argv: **u8) -> int {
    if cfg!(green) {
        green::start(argc, argv, rustuv::event_loop, main)
    } else if cfg!(raw) {
        main();
        0
    } else {
        native::start(argc, argv, main)
    }
}

If the code is bulit with --cfg green, --cfg raw or --cfg spawn, it works as expected. However, if it is built with no cfg, it behaves strangely: FormatMessageW() fails to get system locale mssage, and MessageBoxA() shows gui message box with non-system-themed border.

I guess the main task has bad thread-local data due to libnative::start().

cc @alexcrichton

Activity

alexcrichton

alexcrichton commented on Apr 2, 2014

@alexcrichton
Member

What happens if you add this to the top of the test function?

unsafe { ::std::rt::stack::record_sp_limit(0); }
klutzy

klutzy commented on Apr 2, 2014

@klutzy
ContributorAuthor

Wow, that one fixed all problems!
(Currently backtrace is not shown on windows + libnative due to same reason. Indeed, the line also fixed the issue.)

alexcrichton

alexcrichton commented on Apr 2, 2014

@alexcrichton
Member

See this comment for how I discovered that. It sounds like this modification to the TIB is causing something to fail when some initial library is loaded.

At this point, I think we need to figure out the minimal set of things that need to be done to "get things loaded". This minimal set of things can be done whenever libnative starts up, but I'm not sure what the minimal set of things are.

The minimal set of things includes "boot libgreen and spawn a task" because that's what using libgreen will do, but it would be nice to narrow it down more than that!

Does that make sense?

klutzy

klutzy commented on Apr 3, 2014

@klutzy
ContributorAuthor

The behavior seems to suggest that it's not a good idea to modify $fs:0x14 (or $gs:0x28 on win64), which LLVM/Rust currently uses for segmented stack.

fs:0x14 is known as "ArbitraryUserPointer", but Raymond Chen said that it's "arbitrary" for internals, not for application. (it's safe to believe what Raymond says!)
I met the article before, but I also found conflicting claims (e.g. this and this) and at the time rust worked well, so I assumed it's ok in practice. Not we have to check if it really is :'( Here is C code reproducing the bad behavior:

#include <windows.h>
int main() {
    asm("movl $1234, %fs:0x14");
    MessageBoxA(0, "abc", "abc", 0);
}

So I want to investigate if LLVM can use TIB's stack bounds instead of ArbitraryUserPointer for segmented stack. We use them in target_record_stack_bounds() on win64 to record full available stack area, but if we call target_record_stack_bounds(stack_lo + RED_ZONE, stack_hi) then it can replace ArbitraryUserPointer usage.

klutzy

klutzy commented on Apr 3, 2014

@klutzy
ContributorAuthor

I've replaced all occurrence (including llvm) of $fs:0x14 by $fs:0x08. The example code seems to work well, but it definitely needs more tests.

alexcrichton

alexcrichton commented on Apr 3, 2014

@alexcrichton
Member

Are we sure that the kernel won't modify those TIB values to some other internal stack limit? (just a mild concern of mine)

klutzy

klutzy commented on Apr 4, 2014

@klutzy
ContributorAuthor

Hmm, StackLimit ($fs:0x08) actually means the lowest address of committed page, not limit address of whole stack. Windows internally uses it to detect stack usage on uncommitted page: it will be changed if "thread uses successively lower addresses in the stack". (Uh wait, the link says that pvArbitrary is theoretically safe to use!)
So it is not a good idea to use StackLimit with native stacks which libnative currently uses.

BTW, the librand issue (CryptAcquireContextA) does not occur on win 8.1, but does occur on win 7. (Maybe there are some difference on internal libraries?) The result indicates that it seems hard to collect minimal set of "get things loaded".

alexcrichton

alexcrichton commented on Apr 4, 2014

@alexcrichton
Member

Hm, interestingly your C code example works for me. I think I'm on a windows 7 VM. The CryptAcquireContext example doesn't work for me, however.

vadimcn

vadimcn commented on Apr 21, 2014

@vadimcn
Contributor

How about using the actual TLS API for this? TlsGetValue() and TlsSetValue() are both like 10 instructions long. Is this enough overhead to bother messing with undocumented TIB fields? And if it is, maybe we could write directly into TLS slots (after allocating one via TlsAlloc(), of course).

alexcrichton

alexcrichton commented on Apr 21, 2014

@alexcrichton
Member

It may be possible to do that, although this is an extra bit of overhead an all function calls made in rust (each function is preceded with this information). I also fear that the root cause is still unknown so it's too soon to move away from the current implementation (which seems like it should work).

vadimcn

vadimcn commented on Apr 21, 2014

@vadimcn
Contributor

Actually, function prologues could use current bottom of stack (fs:[8]) for quick comparison, and only check the hard stack limit when kicked off of the fast path. The latter would happen only when stack commit limit is about to grow anyway, so it wouldn't matter for perf.
Of course changing this in LLVM would break everybody's runtime libraries, which use "ArbitraryUserPointer"...

53 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

O-windowsOperating system: Windows

Type

No type

Projects

No projects

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @Zoxc@alexcrichton@DanielKeep@brson@pnkfelix

      Issue actions

        libnative: main task has bad thread local data on windows · Issue #13259 · rust-lang/rust