This post is also available in the following languages. Japanese, Korean

Another one bites the apple!

Junho Jang (ramses)2019-06-14

I am responsible for security assessment at LINE.

Hello, world! I'm Ramses, responsible for security assessment at LINE. My work is to try to hack LINE services and find a way to enhance its security. During my personal time, I find and report security vulnerabilities in third party services also. This is my way of contributing to building a more secure world. Among hackers, this is called a bug hunt. Hackers participate in bug hunting to win a bounty, build their reputation as a hacker or just because they enjoy hacking itself. Many hackers target Apple products for bug hunting, and I'm writing this post to share my journey of bug hunting with Apple products.

Scope of bug hunting

Apple offers a wide range of products such as MacBook, iMac, Mac Pro, iPhone, iPad, and Apple Watch. These products can be classified into desktop products with macOS and mobile devices with iOS.

macOS vs iOS

macOS and iOS are very similar in structure. Same kind of daemons are launched and similar protocols are used for communications between user processes. OS kernel is similar in structure as well. Yet, there are still some differences between desktop and mobile devices in their internal structures, leading to a gap in device drivers and cache system for the efficiency of mobile devices. From a hacker's perspective, which OS do you think is an easier target for a bug hunt? With iOS, it is very difficult to conduct user-level debugging as well as kernel level debugging. It is almost impossible to do kernel debugging for the latest iOS without a specific vulnerability, which makes kernel debugging a black-box testing. On the other hand, kernel debugging is much easier, relatively speaking, for macOS with a VM (Virtual Machine), and it is easier to develop and run a security fuzzer. Consequently, hackers go for bug hunting with macOS to find vulnerabilities and apply them to iOS (although there are hackers who target mobile devices only, taking advantage of iOS-specific vulnerabilities). Given these specificity, this study was focused on the macOS bug hunt.

Defining the purpose of hacking

Within macOS, there are many targets for hacking. Hackers usually select Safari browser for remote code execution or daemon or kernel for local privilege escalation. If you can successfully achieve local privilege escalation through Safari, such code can value as much as USD 500,000. That's why many hackers are obsessed with finding vulnerabilities (Reference). The purpose of hacking for this study was to find a kernel vulnerability which will enable local privilege escalation.

About macOS kernel

The kernel for both macOS and iOS is officially known as XNU, which is a short-form for X is Not Unix. XNU is a hybrid kernel developed by modifying two open-source operating systems. First, BSD (Berkeley Software Distribution) is adopted to provide system call and file system while Mach (Mach kernel developed at Carnegie Mellon University) provides inter-process communications (IPC). The biggest difference from other Unix OS is that XNU has the Mach component.

What is the Mach kernel?

Mach defines and provides system primitives such as tasks, threads and ports. Let's take a look at how the user-level process communicates with the kernel from kernel bug hunt's perspective. Processes in the user space communicate with the kernel space using a protocol called Mach messages. Mach messages are a type of RPC (Remote Procedure Call), generated by a program called MiG as explained below.

Communications between the user space and the kernel space

What is MiG (Mach Interface Generator)?

MiG is a tool to generate RPC code for client-server style Mach IPC. When communicating with the kernel, the user space works as a client while the kernel acts as a server. macOS provides MiG as a default program, and you can view the MiG manual by typing man mig in the terminal.

When you pass a definition file to MiG, it generates two files, one for client another for server. Say, you passed a file with a routine declared as host_kernel_version to MiG. MiG generates the client RPC code in the mach_hostUser.c file and the server RPC code in the mach_hostServer.c file.

The `host_kernel_version` function generated by MiG

Creating a kernel version printing program

Let's make a program that prints out the current version of the kernel, which will help us understand the communications with the kernel. It can be done easily using the host_kernel_version function.

#include <stdio.h>
#include <mach/mach.h>
 
int main(){
    char ver[1024];
    host_name_port_t host;
 
    host = mach_host_self();
    host_kernel_version(host, ver);
 
    printf("kernel version: %s
", ver);
     
    return 0;
}

When you compile and run the code above, you can check the current version of your macOS.

% ./get_kernel_version
kernel version: Darwin Kernel Version 18.2.0

Let's take a deeper look into how the host_kernel_version function works. You can see that the client code and server code are run to communicate with the kernel via Mach messages. When the function is called, the user library composes and sends a Mach message to the kernel handler. The kernel handler, then, interprets the Mach message sent from the user library and calls a function to get the kernel version. In conclusion, many functions used in the user process communicate with the kernel, using Mach messages, like in this example.

Example of how the `host_kernel_version` function is executed via Mach message

Fuzzing, a technique to detect vulnerability using random data as inputs

Fuzzing is an automated software testing technique that provides invalid, unexpected, or random data as inputs to make a program go haywire or crash, to identify its vulnerability. Fuzzer is a tool to conduct fuzzing. For this study, I designed a fuzzer to test Mach message-based communications with the kernel.

Designing MiG Fuzzer

I designed our MiG fuzzer to randomly call functions of the user process that can call the kernel. I used the implementation previously mentioned — the client code generated by MiG — that creates and sends a Mach message to the kernel. The kernel code in the red boxes below is the target for fuzzing. I removed argument checker on the client-side to send random values to the kernel without restrictions.

Let's assume that the host_info function is randomly picked by MiG Fuzzer here. MiG Fuzzer picks the corresponding function, the _call_host_info function, from the function table, mig_table.

const mig_func mig_table[MIG_ENTRY_CNT] = {
    &_call__mach_make_memory_entry,
    &_call_act_get_state,
    &_call_act_set_state,
    &_call_clock_alarm,
    &_call_clock_get_attributes,
    &_call_clock_get_time,
    &_call_host_create_mach_voucher,
    &_call_get_clock_service,
    &_call_host_get_io_master,
    &_call_host_info,
    &_call_host_kernel_version,
    &_call_host_lockgroup_info,
    &_call_process_info,
    &_call_host_request_notification,
...

The _call_host_info function assigns random variable values for fuzzing and passes them to the host_info function.

kern_return_t _call_host_info(bool fake){
    fprintf(output, "host_info
");
 
    mach_port_t req_port = get_port("host_t");
 
    host_flavor_t arg_in_0 = _make_host_flavor_t(); // host_flavor_t flavor
    host_info_t arg_out_1 = (host_info_t) malloc(272);  // host_info_t host_info_out
    uint32_t arg_out_2; // uint32_t host_info_outCnt
 
    kern_return_t kr;
    if(!fake){
        fflush(output): system("sync");
        kr = host_info(req_port, arg_in_0, arg_out_1, &arg_out_2);
    } else
        kr = 0xdeadbeef;
    return kr;
}

The host_info function creates a Mach message, with given arguments and sends the Mach message to the kernel handler.

/* Routine host_info */
mig_external kern_return_t host_info(
    host_t host,
    host_flavor_t flavor,
    host_info_t host_info_out,
    mach_msg_type_number_t *host_info_outCnt
)
{
    ...
    InP->Head.msgh_bits =
        MACH_MSGH_BITS(18, MACH_MSG_TYPE_MAKE_SEND_ONCE);
    /* msgh_size passed as argument */
    InP->Head.msgh_request_port = host;
    InP->Head.msgh_reply_port = mig_get_reply_port();
    InP->Head.msgh_id = 200;
    InP->Head.msgh_reserved = 0;
 
    __BeforeSendRpc(200, "host_info")
    msg_result = mach_msg(&InP->Head, MACH_SEND_MSG|MACH_RCV_MSG|MACH_MSG_OPTION_NONE, mach_msg_size_t)sizeof(Request), (mach_msg_size_t)sizeof(Reply), InP->head.msgh_reply_port, MACH_MSG_TIMEOUT_NONE, MACH_PORT_NULL);
    __AfterSendRpc(200, "host_info)
    if (msg_result != MACH_MSG_SUCCESS) {
        __MachMsgErrorWithoutTimeout(msg_result);
        { return msg_result; }
    }
    ...
}

MiG fuzzer continues with a series of random picks of functions, calling them with random arguments, repeating sending Mach messages to the kernel handler. As MiG fuzzer diligently performs its role, to identify vulnerabilities, monitoring should also follow to see if the macOS kernel functions abnormally or shuts down.

Undertaking fuzzing

Next, I'll explain about how to set up an environment for fuzzing and how to identify vulnerabilities. Fuzz test requires more than just developing a fuzzer. You need to build a fuzzing environment in order to perform a fuzz test. You also need other programs to determine whether vulnerabilities are well triggered and to automatically classify causes for each crash. Now let's build a fuzzing framework by grouping all the required programs so that fuzzing can be automatically performed in a seamless manner.

Building a fuzzing environment

As a fuzz test uses random data inputs, the best case scenario is to build an environment where as many targets can be tested. I used four MacBooks with two macOS VMs each to set up a total of eight fuzzing environment.

Building a fuzzing framework

I have set it up so that MiG fuzzer would run on each VM and a crash report would be sent to the crash collector server as shown in the diagram below. Crash collector server automatically classifies each crash report, which can be used as a basis to identify vulnerabilities for further analysis.

Fuzzing results

Fuzzing was conducted for about two weeks and I found the following results. I extracted the key messages of the crash reports from each VM and generated a fixed-length MD5 hash per message. These hash values were used to eliminate redundant messages and produce a summary report to point out vulnerabilities.

Crash analysis

After analyzing the fuzzing results to identify crashes prone to high vulnerability, it is time to see if the same crash can be triggered again. It is to confirm which code caused the crash. If you keep a record of the code used for fuzzing, it is much easier to cause the crash again. Let me go through crash report analysis, crash recurrence and analysis of causes for crash.

Crash report analysis

When you analyze the crash report, you can find out which function caused the crash. The following is the excerpt from one of the interesting crash reports. It shows that the kernel's mach_vm_page_range_query function calls the vm_map_page_range_info_internal function, which executes the bzero function, and there was a crash.

Crash recurrence

Based on the data from fuzzing, I wrote the following code, which was designed to cause a crash. The simplest code that triggers a crash or reveals a vulnerability is called PoC (Proof of Concept).

task_t task = mach_task_self();
 
mach_vm_offset_t address = 0x10;
mach_vm_size_t size = 0xffffffffffffffff;
 
int result[40] = {0,};
mach_vm_address_t disposition_addr = (mach_vm_address_t) &result;
mach_vm_size_t dispositions_count = 10;
 
mach_vm_page_range_query(task, address, size, dispositions_addr, &dispositions_count);

When you run this code, it will cause a crash in the kernel and the system will reboot. If you want to try, I strongly recommend you to try this on a VM. :) This code causes a crash in macOS 10.14.4 and iOS 12.2.

Analyzing causes for crash

Let's follow each input by PoC to find a cause for the crash. XNU kernel source code is partially public. Using the code, I'll track down the cause for the crash within the function.

The PoC code calls the mach_vm_page_range_query function with address as the second parameter and size as the third. The respective values of these two parameters are 0x10 and 0xffffffffffffffff (16 fs). When the value of the size is converted to a 64 bit (or 8 byte) integer, it is equal to -1. This value is passed to the kernel's mach_vm_page_range_query function, and the local variables within the function, start and end, are 0 and 0x1000 respectively. When calculating the value of the variable end, the sum of address and size is used as an argument to the mach_vm_round_page function. This causes integer overflow; the sum of 0x10 and 0xffffffffffffffff equals to0x10000000000000009, and this value is considered as 0x9 as the first digit 1 is not in the scope of a 64-bit integer. This was not checked during the process, and this is what caused the crash.

Analysis of the `mach_vm_page_range_query` function - 1

Assessing the value of crash

Now, I need to assess if this crash is meaningful as a vulnerability. I can do so by tracking the kernel source up to the crash.

Tracking the kernel source code

Using the values of variables start and end the value of num_pages is calculated. Then, the kalloc function is called with the value of num_pages and assigned to the kernel's heap memory. The variables info and local_disp are pointers to the memory, one for 32 bytes and another for 4 bytes, respectively.

Analysis of the `mach_vm_page_range_query` function - 2

And here we have a loop.

Analysis of the of `mach_vm_page_range_query` function - 3

When the while loop starts, the vm_map_page_range_info_internal function is called. Since putting the whole code up here might confuse you, I'll briefly explain the overall flow. At the end of the first iteration of the loop above, because the initial value of size is very big, the value of curr_sz is set to 1 GB.

//inside the loop
curr_sz = MIN(mach_vm_round_page(size), MAX_PAGE_RANGE_QUERY);   // MAX_PAGE_RANGE_QUERY is 1024*1024*1024

Let's have a look at what goes on inside the vm_map_page_range_info_internal function in the second iteration of the loop. The variable curr_sz set to 1 GB is passed to the third parameter of the function, end_offset. Now, this value gets assigned to end as shown below. In the end, this value is assigned to curr_e_offset.

// end_offset is set to end
end = vm_map_round_page(end_offset, PAGE_MASK);
 
 
...
 
 
// end is set to curr_e_offset
curr_e_offset = end;

Summing up, the value of the size variable results in the variable curr_e_offset being set to 1 GB. Consequently, the bzero function initializes the memory to 0 for the size of about 8 MB, starting from the heap memory assigned to variable info with the size of 32 bytes.

Analysis of the `vm_map_page_range_info_internal` function

Assessment

When hackers try to obtain local privilege escalation, they usually go through the following steps: Go for fuzzing → identify multiple crashes → classify crashes → analyze meaningful crashes → launch an exploit (running a specific code without causing a crash). If the kernel crashes before an exploit is successful, the system will reboot. So, the key is not to cause a crash before the attack is successful. The vulnerability identified above overwrites the kernel heap memory assigned for 32 bytes with 8 MB. Kernel heap memory has values of various objects and metadata, composing the page. When these values are manipulated, it becomes difficult to make exploits successful without causing the kernel to crash. Therefore, it is time to look for another attack technique or vulnerability to make it work without crashing the kernel.

Another trigger for vulnerability identified during kernel source analysis

As I was analyzing what's going on in the of mach_vm_page_range_query and vm_map_page_range_info_internal functions, I noticed another vulnerability stemming from the same cause. If you assign a large value as 0xfffffffffffee010, thebzero function is not called but the kernel heap memory pointed by local_disp is overwritten.

for (i = 0; i < num_pages; i++) {
    ((int*)local_disp)[i] = ((vm_page_info_basic_t)info)[i].disposition;
}

Here again, the same vulnerability arises as the heap memory assigned as 4 bytes is overwritten with 1 MB.

Studying how to take advantage of this vulnerability

Whenever I got a chance, I tried to find ways to exploit this vulnerability. One idea I had was using the "kernel heap feng-shui technique", which I'd explain below.

What is kernel heap feng-shui?

Heap feng-shui is named after feng-shui, and it is one of attack techniques that a hacker tries to manipulate the layout of the heap as wanted. If a hacker uses heap feng-shui to assign 1 MB of unnecessary values from the heap memory address pointed by the variable local_disp, it will prevent the kernel from crashing and take full advantage of this vulnerability.

If the kernel heap memory is manipulated as shown above and a target object with reference count is assigned right after 1 MB, this vulnerability can be converted to Use After Free vulnerability by changing the reference count to zero. Once it takes the form of Use After Free, hackers can exploit this vulnerability in many ways to take over the kernel right and achieve local privilege escalation.

Converting to Use After Free vulnerability with kernel heap feng-shui

Closing

Apple patched this vulnerability in the macOS 14.4.5 and iOS 12.3 updates released on May 14, 2019. (Vulnerability number: CVE-2019-8576). I submitted a detailed report on identified vulnerabilities to product-security@apple.com according to Apple's guideline on reporting a security or privacy vulnerability. After I sent the email, I first got an automatic confirmation from Apple. Later, I was contacted by a person in charge of relevant vulnerabilities. I was informed of a brief patch schedule and asked to provide a name to be left on the security update note. Apple collects vulnerability reports and conducts regular security updates. It took about three months from reporting to patch for this specific vulnerability. I think patches can take between one to three months after reporting, depending on the security update cycle.

This is all I have to share on how kernel bug hunting was done and what vulnerabilities were found. I haven't found much use for this vulnerability yet, but it can be improved if I find better ways to control the kernel heap memory. I strongly encourage you to share ideas and join me in research efforts. Thank you for taking a time to read this post.

Security Engineering

Blog