The Mystery of Mach-O Object Structure

I’m going to tell you about the internals of the Mach-O file and give an introduction to the simple relocatable object file structure

Alex Dremov
The Mystery of Mach-O Object Structure

During the development of the final project for “the assembly language and low-level architecture” MIPT freshman course, we were developing a compilable programming language. I wanted to make it compilable to the standard object file but encountered the mystery of almost no information about its structure. What’s more important, there were little to no examples on this topic. In this article, I’m going to tell you about the internals of the Mach-O file and give an introduction to the simple relocatable object file structure.

General Structure

Mach-O file can be divided into three main parts:

image of the structure
  • Header
  • Load commands
  • Data

The header contains general information and identifies the file as a Mach-O file. The header also contains other basic file type information, indicates the target architecture, and contains flags specifying options that affect the interpretation of the rest of the file.

Directly after the header is series of variable-size load commands that specify the layout and linkage characteristics of the file. This is the core that defines the file characteristics.

Following the load commands, all Mach-O files contain segment data. Each segment has zero or more sections. Each segment defines a region of virtual memory that the dynamic linker maps into the address space of the process. Apart from segment data, other data also can be placed here. For example, symbol table, relocations, etc.

Object-specific structure

As this article focuses on object files, I will not go into details about general executable files. Even though their format is the same, load commands and data differ.

To make a workable object file, we need to define these elements. I ordered them in the order they will be placed in the file.

  • Header
  • Load commands
  • Segment (__TEXT)
  • Text section (__text)
  • Data section (__data)
  • Symbols table (SYMTAB)
  • Dynamic symbols table (DYSYMTAB)
  • Data
  • Text section data
  • Data section data
  • Relocations
  • Symbol table data
  • String table

Header is defined by this structure:

struct mach_header_64 {
    uint32_t       magic;      /* mach magic number identifier */
    cpu_type_t     cputype;    /* cpu specifier */
    cpu_subtype_t  cpusubtype; /* machine specifier */
    uint32_t       filetype;   /* type of file */
    uint32_t       ncmds;      /* number of load commands */
    uint32_t       sizeofcmds; /* the size of all the load commands */
    uint32_t       flags;      /* flags */
    uint32_t       reserved;   /* reserved */
};
  1. magic – it’s exactly what the name says. It simply contains the magic number that helps to identify the file as Mach-O. It holds MH_MAGIC_64 (0xfeedfacf) constant.
  2. cputype, cpusubtype – defines CPU information. For most cases, CPU_TYPE_X86_64 and CPU_SUBTYPE_X86_64_ALL can be used.
  3. filetype – as Mach-O file can be used for multiple purposes, it is needed to know the file type. As we build an object file, MH_OBJECT must be used.
  4. ncmds – number of load commands followed by the header.
  5. sizeofcmds – the size of load commands (in bytes).
  6. flags – special flags, can be found here. For the object file, we will be using MH_SUBSECTIONS_VIA_SYMBOLS which means that the sections of the object file can be divided into individual blocks. These blocks are dead-stripped if they are not used by other codes.
  • MH_NOUNDEFS — The object file contained no undefined references when it was built.
  • MH_INCRLINK — The object file is the output of an incremental link against a base file and cannot be linked again.
  • MH_DYLDLINK — The file is input for the dynamic linker and cannot be statically linked again.
  • MH_TWOLEVEL — The image is using two-level namespace bindings.
  • MH_BINDATLOAD — The dynamic linker should bind the undefined references when the file is loaded.
  • MH_PREBOUND — The file’s undefined references are prebound.
  • MH_PREBINDABLE — This file is not prebound but can have its prebinding redone. Used only when MH_PREBEOUND is not set.
  • MH_NOFIXPREBINDING — The dynamic linker doesn’t notify the prebinding agent about this executable.
  • MH_ALLMODSBOUND — Indicates that this binary binds to all two-level namespace modules of its dependent libraries. Used only when MH_PREBINDABLE and MH_TWOLEVEL are set.
  • MH_CANONICAL — This file has been canonicalized by unprebinding—clearing prebinding information from the file. See the redo_prebinding man page for details.
  • MH_SPLIT_SEGS — The file has its read-only and read-write segments split.
  • MH_FORCE_FLAT — The executable is forcing all images to use flat namespace bindings.
  • MH_SUBSECTIONS_VIA_SYMBOLS — The sections of the object file can be divided into individual blocks. These blocks are dead-stripped if they are not used by other codes. See “Linking” for details.
  • MH_NOMULTIDEFS — This umbrella guarantees there are no multiple definitions of symbols in its subimages. As a result, the two-level namespace hints can always be used.
  1. reserved – reserved bytes, not used.

Summing up, here is the code for initializing header for object file.

“To be modified” means that it is not possible to determine the value before constructing the file. Therefore, it will be changed afterwards.

mach_header_64 header = {};
header.magic          = MH_MAGIC_64;
header.cputype        = CPU_TYPE_X86_64;
header.cpusubtype     = CPU_SUBTYPE_X86_64_ALL;
header.filetype       = MH_OBJECT;
header.ncmds          = 0; /* to be modified */
header.sizeofcmds     = 0; /* to be modified */
header.flags          = MH_SUBSECTIONS_VIA_SYMBOLS;

Load commands

The load command structures are located directly after the header of the object file, and they specify both the logical structure of the file and the layout of the file in virtual memory.

For an object file, several load commands are needed: segment section, symtab, dysymtab. Every load command has two the same fields in the beginning: uint32_t cmd and uint32_t cmdsize, but the following content differs.

segment_command_64

Specifies the range of bytes in a 64-bit Mach-O file that make up a segment. Those bytes are mapped by the loader into the address space of a program. Segment structure is:

struct segment_command_64 {  /* for 64-bit architectures */
   uint32_t   cmd;           /* LC_SEGMENT_64 */
   uint32_t   cmdsize;       /* includes sizeof section_64 structs */
   char       segname[16];   /* segment name */
   uint64_t   vmaddr;        /* memory address of this segment */
   uint64_t   vmsize;        /* memory size of this segment */
   uint64_t   fileoff;       /* file offset of this segment */
   uint64_t   filesize;      /* amount to map from the file */
   vm_prot_t  maxprot;       /* maximum VM protection */
   vm_prot_t  initprot;      /* initial VM protection */
   uint32_t   nsects;        /* number of sections in segment */
   uint32_t   flags;         /* flags */
};
  1. segname – the name of the segment. There are no requirements, but it is common to start the name with a double underline (__) and use uppercase. For example, SEG_TEXT (“__TEXT”), SEG_DATA (“__DATA”).
  2. vmaddr – the start of this segment in virtual memory.
  3. vmsize – the size of this segment in memory. For executables, this value must be divisible by page. In object files, this is not needed as this requirement is fulfilled on the linking stage.
  4. fileoff – offset of this segment in the file. This offset points to some areas after load commands. The image below helps
  5. filesize – the amount of file from fileoff to be mapped.
  6. maxprot – maximum virtual memory protection. For TEXT segment, usually, VM_PROT_READ | VM_PROT_EXECUTE | VM_PROT_WRITE .
  7. initprot – memory protection during initialization.
  8. nsect – number of sections directly followed by this segment.
  9. flags – can be found here. For the object file, no flags are needed.

section_64

Segment load command is directly followed by sections defined in it.

struct section_64 {          /* for 64-bit architectures */
   char       sectname[16];  /* name of this section */
   char       segname[16];   /* segment this section goes in */
   uint64_t   addr;          /* memory address of this section */
   uint64_t   size;          /* size in bytes of this section */
   uint32_t   offset;        /* file offset of this section */
   uint32_t   align;         /* section alignment (power of 2) */
   uint32_t   reloff;        /* file offset of relocation entries */
   uint32_t   nreloc;        /* number of relocation entries */
   uint32_t   flags;         /* flags (section type and attributes)*/
   uint32_t   reserved1;     /* reserved (for offset or index) */
   uint32_t   reserved2;     /* reserved (for count or sizeof) */
   uint32_t   reserved3;     /* reserved */
};
  1. sectname – the name of the section. There are no requirements, but it is common to start the name with a double underline (__) and use lowercase. For example, SECT_TEXT (“__text”), SECT_DATA (“__data”).
  2. segname – the name of the segment this section goes in.
  3. addr – memory address of this section. For example, if segment vaddress is 0x10000, then first section address is also 0x10000.
  4. size – the size in bytes of this section in the file.
  5. offset – the offset of the file section from the start of the file.
  6. align – alignment of the section as a power of 2. For example, 1 means 2 bytes alignment, 2 means 4 bytes alignment. Specifies the alignment of the section in memory.
  7. reloff – the offset of relocations array from the file beginning.
  8. nreloc – number of relocations.
  9. flags – specify information about data contained in the section. For example, for code S_REGULAR | S_ATTR_PURE_INSTRUCTIONS | S_ATTR_SOME_INSTRUCTIONS. For the data section, S_REGULAR.
  10. reserved1, reserved2, reserved3 – unused in our case.

Segment load command and sections are the most important part of the file. Object file has only one segment and one or several sections.

Now, we can define a segment and sections associated with it.

__TEXT segment – the only segment in the object file

segment_command_64 segment = {};
/*
 * Usually, as there is only one segment in the object file,
 * placing name is omitted. 
 * strcpy(segment.segname, SEG_TEXT);
 */
segment.cmd                = LC_SEGMENT_64;
segment.cmdsize            = sizeof(segment) + 2 * sizeof(section_64);
segment.vmaddr             = 0;
segment.vmsize             = 0; /* to be modified */
segment.fileoff            = 0; /* to be modified */
segment.filesize           = 0; /* to be modified */
segment.maxprot            = VM_PROT_READ | VM_PROT_EXECUTE;
segment.initprot           = VM_PROT_READ | VM_PROT_EXECUTE;
segment.nsects             = 2; /* code and data sections */

__text section

section_64 sectionText     = {};
strcpy(sectionText.segname,  SEG_TEXT ); /* segname  <- __TEXT */
strcpy(sectionText.sectname, SECT_TEXT); /* sectname <- __text */
sectionText.addr           = 0;
sectionText.size           = 0;          /* to be modified */
sectionText.offset         = 0;          /* to be modified */
sectionText.align          = 4;          /* 2^4 code alignment */
sectionText.reloff         = 0;          /* to be modified */
sectionText.nreloc         = 0;          /* to be modified */
sectionText.flags          = S_REGULAR |
                             S_ATTR_PURE_INSTRUCTIONS |
                             S_ATTR_SOME_INSTRUCTIONS;

__data section

section_64 sectionData     = {};
strcpy(sectionData.segname,  SEG_DATA ); /* segname  <- __DATA */
strcpy(sectionData.sectname, SECT_DATA); /* sectname <- __data */
sectionData.addr           = 0;          /* = sectionText.size */
sectionData.size           = 0;          /* to be modified */
sectionData.offset         = 0;          /* = sectionText.offset */
                                         /*   + sectionText.size */
sectionData.align          = 1;          /* 2^1 code alignment */
sectionData.reloff         = 0;          /* no relocations in data section */
sectionData.nreloc         = 0;          
sectionData.flags          = S_REGULAR;

At this point, simple object file structure is almost ready, but SYMTAB and DYSYMTAB load commands are steel needed to be defined even if there is no relocations at all.

Symtab

Describes the size and location of the symbol table data structures. Its structure is:

struct symtab_command {
   uint32_t   cmd;       /* LC_SYMTAB */
   uint32_t   cmdsize;   /* sizeof(struct symtab_command) */
   uint32_t   symoff;    /* symbol table offset */
   uint32_t   nsyms;     /* number of symbol table entries */
   uint32_t   stroff;    /* string table offset */
   uint32_t   strsize;   /* string table size in bytes */
};
  1. symoff – offset to the symbol table – located after load commands somewhere further in the file.
  2. nsyms – number of symbols in symbols table.
  3. stroff – string table offset.
  4. strsize – the size of the string table in bytes.

The most straightforward description so far. It is convenient to describe a symbol table and string table here.

String table

The string table is the most straightforward structure of all listed here. It is simply strings separated by zeros.

Symbol table

Symbol table consists of equally sized entries. They must be grouped by their type – local symbols (further grouped by the module they are from), defined external symbols (further grouped by the module they are from), and undefined symbols. The order of groups is not important.

struct nlist_64 {
    union {
        uint32_t  n_strx;  /* index into the string table */
    } n_un;
    uint8_t n_type;        /* type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see <mach-o/stab.h> */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};
  1. n_strx – index of the string in the string table. For example, the index of “_print” in the string table above is 1. The index of _giveYouUp0 is 8; it is the position of the first letter from the start of the string table.
  2. n_type – a type of symbol. Defines the meaning of the symbol. There are essential values:
  3. N_TYPE (0x0e) – These bits define the type of the symbol.
  4. N_UNDF (0x0) – The symbol is undefined. Undefined symbols are symbols referenced in this module but defined in a different module. The n_sect field is set to NO_SECT.
  5. N_ABS (0x2) – The symbol is absolute. The linker does not change the value of an absolute symbol. The n_sect field is set to NO_SECT.
  6. N_SECT (0xe) – The symbol is defined in the section number given in n_sect.
  7. N_PBUD (0xc) – The symbol is undefined and the image is using a prebound value for the symbol. The n_sect field is set to NO_SECT.
  8. N_INDR ( 0xa) – The symbol is defined to be the same as another symbol. The n_value field is an index into the string table specifying the name of the other symbol. When that symbol is linked, both this and the other symbol have the same defined type and value.
  9. N_EXT  (0x01) – If this bit is on, this symbol is external, a symbol that is either defined outside this file or that is defined in this file but can be referenced by other files.
  10. N_STAB (0xe0) – If any of these 3 bits are set, the symbol is a symbolic debugging table (stab) entry. In that case, the entire n_type field is interpreted as a stabvalue.
  11. n_sect – an integer specifying the number of the section that this symbol can be found in, or NO_SECT if the symbol is not to be found in any section.
  12. n_desc – provides additional information about the nature of this symbol for non-stab symbols (not N_STAB). The reference flags can be accessed using the REFERENCE_TYPE mask (0xF). Usually, REFERENCE_FLAG_UNDEFINED_NON_LAZY used for external symbols. If the symbol is defined in the section (N_SECT), use REFERENCE_FLAG_DEFINED + N_EXT if you want to make it available from other files or REFERENCE_FLAG_PRIVATE_DEFINED without specifying N_EXT if not. The most used values are:
  13. REFERENCE_FLAG_UNDEFINED_NON_LAZY (0x0)—This symbol is a reference to an external non-lazy (data) symbol.
  14. REFERENCE_FLAG_UNDEFINED_LAZY (0x1)—This symbol is a reference to an external lazy symbol—that is, to a function call.
  15. REFERENCE_FLAG_DEFINED (0x2)—This symbol is defined in this module.
  16. REFERENCE_FLAG_PRIVATE_DEFINED (0x3)—This symbol is defined in this module and is visible only to modules within this shared library.
  17. REFERENCE_FLAG_PRIVATE_UNDEFINED_NON_LAZY (0x4)—This symbol is defined in another module in this file, is a non-lazy (data) symbol, and is visible only to modules within this shared library.
  18. REFERENCE_FLAG_PRIVATE_UNDEFINED_LAZY (0x5)—This symbol is defined in another module in this file, is a lazy (function) symbol, and is visible only to modules within this shared library.
  19. n_value – information about this symbol. The format of this value is different for each type of symbol table entry (as specified by the n_type field). For the N_SECT symbol type, n_value is the address of the symbol – offset from the start of the segment. For N_UNDF | N_EXT it is not used.

This structure is one of the hardest to understand and use. Therefore, there are examples. Notice that symbols are grouped. It will be used later in DYSYMTAB.

On the image above, there are four symbols in total. Two of them are locally defined, two of them undefined in the current file. There are descriptions of two of these symbols:

  • #0th symbol
  1. n_strx = 34 – index of naming’s first symbol in the string table.
  2. n_type = N_SECT | N_EXT – symbol defined in some section of the current file and available externally.
  3. n_sect = 1 – symbol defined in the first (counting from 1) section.
  4. n_desc = REFERENCE_FLAG_DEFINED – symbol defined in the file. This information is redundant as it is already known from N_SECT.
  5. value = 0 – symbol definition locates at the very beginning of the segment (zero offset).
  • #2nd symbol
  1. n_strx = 1 – index of naming’s first symbol in the string table.
  2. n_type = N_UNDF | N_EXT – symbol is not defined in the current file, must be defined externally.
  3. n_sect = NO_SECT – no associated section.
  4. n_desc = REFERENCE_FLAG_UNDEFINED_NON_LAZY – this symbol is a reference to an external non-lazy (data) symbol.
  5. value = 0 – unused.

These two symbols can be constructed like this:

nlist_64 symbols[2] = {
    {34, N_SECT  | N_EXT, 1      , REFERENCE_FLAG_DEFINED           , 0},
    {1 , N_UNDF | N_EXT, NO_SECT, REFERENCE_FLAG_UNDEFINED_NON_LAZY, 0}
};

Dysymtab

It describes the sizes and locations of the parts of the symbol table used for dynamic linking. As I already noticed, symtab entries must be grouped by their type. Here, this requirment is used.

struct dysymtab_command {
    uint32_t cmd;            /* LC_DYSYMTAB */
    uint32_t cmdsize;        /* sizeof(struct dysymtab_command) */
    uint32_t ilocalsym;      /* index to local symbols */
    uint32_t nlocalsym;      /* number of local symbols */

    uint32_t iextdefsym;     /* index to externally defined symbols */
    uint32_t nextdefsym;     /* number of externally defined symbols */

    uint32_t iundefsym;      /* index to undefined symbols */
    uint32_t nundefsym;      /* number of undefined symbols */

    uint32_t tocoff;         /* file offset to table of contents */
    uint32_t ntoc;           /* number of entries in table of contents */

    uint32_t modtaboff;      /* file offset to module table */
    uint32_t nmodtab;        /* number of module table entries */


    uint32_t extrefsymoff;   /* offset to referenced symbol table */
    uint32_t nextrefsyms;    /* number of referenced symbol table entries */


    uint32_t indirectsymoff; /* file offset to the indirect symbol table */
    uint32_t nindirectsyms;  /* number of indirect symbol table entries */


    uint32_t extreloff;      /* offset to external relocation entries */
    uint32_t nextrel;        /* number of external relocation entries */

    uint32_t locreloff;      /* offset to local relocation entries */
    uint32_t nlocrel;        /* number of local relocation entries */

}; 

There are a lot of fields, but only several of them are needed for object files.

  1. ilocalsym + nlocalsym – local symbols are used only for debugging.
  2. iextdefsym + nextdefsym – external symbols.
  3. iundefsym + nundefsym – undefined symbols.

Fields with i* prefix indicate index of the first entry in the symbol table, while n* holds the number of such symbols.

Relocations

Finally, all this structures were needed just to be able to do relocations. But why we even need them? Consider this assembly code:

call     ...   ; call function – external or internal
mov      rax, [rip + ...] ; load global variable

In both of these cases address or offset is not known until the linking stage as segments will be rearranged, combined, and placed back in some order. Linker will substitute address or offset by the relevant one. Relocations information specifies where address must be changed, how it must be changed and for what symbol.

Relocations entry is defined as:

struct relocation_info {
   int32_t  r_address;        /* offset in the section to */
                              /* what is being relocated */
   uint32_t r_symbolnum:24,   /* symbol index if r_extern == 1 or
                              /* section ordinal if r_extern == 0 */
            r_pcrel:1,        /* was relocated pc relative already */
            r_length:2,       /* 0=byte, 1=word, 2=long, 3=quad */
            r_extern:1,       /* does not include value of sym referenced */
            r_type:4;         /* if not 0, machine specific relocation type */
};

Do you remember that each section may have relocations and they are specified in corresponding field of section dtructure? Here are relocations themselves.

  1. r_address – offset of value that is needed to be relocated from the start of the section.
  2. r_symbolnum – as symbol index in symbol table if r_extern == 1 or section ordinal (number) if r_extern == 0.
  3. r_pcrel – (1/0) Indicates whether the item containing the address to be relocated is part of a CPU instruction that uses PC-relative addressing. For addresses contained in PC-relative instructions, the CPU adds the address of the instruction to the address contained in the instruction.
  4. r_length – Indicates the length of item containing the address to be relocated. A value of zero indicates a single byte; a value of 1 indicates a 2-byte address, and a value of 2 indicates a 4-byte address.
  5. r_extern – (1/0) Indicates whether the r_symbolnum field is an index into the symbol table (1) or a section number (zero).
  6. r_type – Indicates the type of relocation to be performed. Possible values for this field are shared between this structure and the scattered_relocation_info data structure; see the description of the r_type field in the scattered_relocation_info data structure for more details. There are two most used values:
  7. GENERIC_RELOC_SECTDIFF – used for relative call addresses.
  8. GENERIC_RELOC_PAIR – used for global variable rip relative offset.

Here’s an example of common relocation:

relocation_info relocation = {};
relocation.r_address = ...           /* some offset to the beginning */
                                     /* of relocatable address */
relocation.r_symbolnum = 0;          /* first symbol in symtab */
relocation.r_pcrel = 1;              /* let it be call instruction that */
                                     /* is PC-relative */
relocation.r_length = 2;             /* 4-bytes address */
relocation.r_extern = 1;             /* external symbol */
relocation.r_type   = GENERIC_RELOC_SECTDIFF;

Cumulative example

Here, I provide a code of constructing complete Mach-O object file with call to external function and call to internal function.

mach_header_64 header = {};
header.magic          = MH_MAGIC_64;
header.cputype        = CPU_TYPE_X86_64;
header.cpusubtype     = CPU_SUBTYPE_X86_64_ALL;
header.filetype       = MH_OBJECT;
header.ncmds          = 0; /* to be modified */
header.sizeofcmds     = 0; /* to be modified */
header.flags          = MH_SUBSECTIONS_VIA_SYMBOLS;

segment_command_64 segment = {};
segment.cmd                = LC_SEGMENT_64;
segment.cmdsize            = sizeof(segment) + sizeof(section_64);
segment.vmaddr             = 0;
segment.vmsize             = 0; /* to be modified */
segment.fileoff            = 0; /* to be modified */
segment.filesize           = 0; /* to be modified */
segment.maxprot            = VM_PROT_READ | VM_PROT_EXECUTE;
segment.initprot           = VM_PROT_READ | VM_PROT_EXECUTE;
segment.nsects             = 0; /* to be modified */

section_64 sectionText     = {};
strcpy(sectionText.segname,  SEG_TEXT ); /* segname  <- __TEXT */
strcpy(sectionText.sectname, SECT_TEXT); /* sectname <- __text */
sectionText.addr           = 0;
sectionText.size           = 0;          /* to be modified */
sectionText.offset         = 0;          /* to be modified */
sectionText.align          = 4;          /* 2^4 code alignment */
sectionText.reloff         = 0;          /* to be modified */
sectionText.nreloc         = 0;          /* to be modified */
sectionText.flags          = S_REGULAR |
                             S_ATTR_PURE_INSTRUCTIONS |
                             S_ATTR_SOME_INSTRUCTIONS;

const unsigned char code[] = {
        0xE8, 0x00, 0x00, 0x00, 0x00,      // call <address> - someFuncExternal
        0xE8, 0x00, 0x00, 0x00, 0x00,      // call <address> - someFunc
        0xB8, 0x01, 0x00, 0x00, 0x02,      // mov     rax, 0x2000001 ; exit
        0xBF, 0x00, 0x00, 0x00, 0x00,      // mov     rdi, 0
        0x0F, 0x05,                        // syscall
        // someFunc:
        0x48, 0x31, 0xC0,                  // xor rax, rax
        0xC3                               // ret
};

symtab_command symtabCommand    = {};
symtabCommand.cmd               = LC_SYMTAB;
symtabCommand.cmdsize           = sizeof(symtab_command);
symtabCommand.symoff            = 0;       /* to be modified */
symtabCommand.nsyms             = 0;       /* to be modified */
symtabCommand.stroff            = 0;       /* to be modified */
symtabCommand.strsize           = 0;       /* to be modified */

const char stringTable[]        = "\0_someFunc0\0_someFuncExternal0\0";

nlist_64 symbols[2] = {
        {
            1,                      // first index in string table
            N_SECT | N_EXT,         // defined in the file, available externally
            1,                      // first section
            REFERENCE_FLAG_DEFINED, // defined in the file
            4 * 5 + 2               // offset of this symbol in the section
        },
        {
            12,                      // second string in string table
            N_UNDF  | N_EXT,         // undefined in the file,
                                     // must be defined externally
            NO_SECT,                 // no section specified
            REFERENCE_FLAG_UNDEFINED_NON_LAZY, // external non-lazy symbol
            0                        // unused
        }
};

dysymtab_command dysymtabCommand      = {};
dysymtabCommand.cmd                   = LC_DYSYMTAB;
dysymtabCommand.cmdsize               = sizeof(dysymtabCommand);
dysymtabCommand.ilocalsym             = 0; // first symbol in symbol table
dysymtabCommand.nlocalsym             = 1; // only one locally defined symbol
dysymtabCommand.iextdefsym            = 1; // second symbol in symbol table
dysymtabCommand.nextdefsym            = 1; // only one externally defined symbol

relocation_info relocations[] = {
        {
            1,      // after first byte address to someFuncExternal
            1,      // second symbol
            1,      // relative call, PC counted
            2,      // 4 bytes
            1,      // external
            GENERIC_RELOC_SECTDIFF
        },
        {
            6,      // second call address
            0,      // first symbol
            1,      // relative call, PC counted
            2,      // 4 bytes
            1,      // external
            GENERIC_RELOC_SECTDIFF
        },
};

size_t offsetCounter = 0;
FILE* binary = fopen("object.o", "wb");

// Write header;
header.ncmds = 3; // segment + symtab + dysymtab
header.sizeofcmds = sizeof(segment) + sizeof(sectionText) + sizeof(symtabCommand) + sizeof(dysymtabCommand);
fwrite(&header, 1, sizeof(header), binary);
offsetCounter += sizeof(header);

// Write segment
segment.vmsize  = segment.filesize = sizeof(code);
segment.fileoff = header.sizeofcmds + sizeof(header); // we'll place code just after all load commands.
segment.nsects  = 1;
fwrite(&segment, 1, sizeof(segment), binary);
offsetCounter += sizeof(segment);

// Write section
sectionText.size   = segment.filesize;
sectionText.offset = segment.fileoff;
sectionText.reloff = segment.fileoff + segment.filesize; // just after the code
sectionText.nreloc = sizeof(relocations) / sizeof(relocations[0]); // two calls
fwrite(&sectionText, 1, sizeof(sectionText), binary);
offsetCounter += sizeof(sectionText);

// Write symtab
symtabCommand.symoff = sectionText.reloff +
                        sectionText.nreloc * sizeof(relocation_info); // just after relocations
symtabCommand.nsyms = 2; // two functions
symtabCommand.stroff = symtabCommand.symoff +
                        symtabCommand.nsyms * sizeof(nlist_64); // just after symbol table
symtabCommand.strsize = sizeof(stringTable);
fwrite(&symtabCommand, 1, sizeof(symtabCommand), binary);
offsetCounter += sizeof(symtabCommand);

// Write dysymtab
fwrite(&dysymtabCommand, 1, sizeof(dysymtabCommand), binary);
offsetCounter += sizeof(dysymtabCommand);

// Write code
fwrite(&code, 1, sizeof(code), binary);

// Write relocations
fwrite(&relocations, 1, sizeof(relocations), binary);

// Write symbol table
fwrite(&symbols, 1, sizeof(symbols), binary);

// Write string table
fwrite(&stringTable, 1, sizeof(stringTable), binary);

fclose(binary);

References

  1. Developer collection – relocation_info
  2. Mach-O format reference OSX-ABI
  3. MachOViewer – check out your file structure

Share

Subscribe to Alex Dremov

Get the email newsletter and receive valuable tips to bump up your professional skills