summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorH. Peter Anvin <hpa@zytor.com>2019-07-27 21:25:55 -0700
committerH. Peter Anvin <hpa@zytor.com>2019-07-27 21:25:55 -0700
commitcb0753321da42e979d51a930895c2f874cb0c09a (patch)
treeba98c63e15885a099f742541226531f01504a79b
parent30153068ec64217828a615ae3ee46888cf2fc72c (diff)
downloadabi-cb0753321da42e979d51a930895c2f874cb0c09a.tar.gz
abi-cb0753321da42e979d51a930895c2f874cb0c09a.tar.xz
abi-cb0753321da42e979d51a930895c2f874cb0c09a.zip
Update ABI document
-rw-r--r--segelf.txt360
1 files changed, 199 insertions, 161 deletions
diff --git a/segelf.txt b/segelf.txt
index 0050316..2cc81f7 100644
--- a/segelf.txt
+++ b/segelf.txt
@@ -2,7 +2,10 @@ ABI for 16-bit real mode segmented code in ELF
----------------------------------------------
H. Peter Anvin
-Version: 2019-01-10
+Version: 2019-07-26
+
+I. General definitions
+----------------------
16-bit segmented code in ELF is implemented with a combination of
three new relocations and a set of software conventions. This document
@@ -12,47 +15,98 @@ The extensions are implemented in such a way that mixed-mode
programming is possible, as well, with the binary format explicitly
exposing segment-relative and absolute relocations.
+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in RFC 2119 as published
+by the IETF.
-Requirements
-------------
-
-16-bit code relies on a combination of segment types:
+The term "section" in this document is as defined in the ELF gABI and
+i386 psABI specifications, except for the term "section group" which
+is used in the sense typically used in segmented programming.
-1. NEAR segments are addressed from a common segment base, and the
- segment registers are generally kept at a fixed value. All NEAR
- segments combined may not exceed 64K.
+The term "segment" refers to a segment base as used in the x86
+architecture, typically in real or V86 mode (although with suitable
+LDT setup and relocation processing it MAY be possible to run in
+segmented protected mode.)
-2. FAR segments are addressed from a segment base specific to that
- segment. Any one FAR segment may not exceed 64K.
-3. HUGE segments are addressed from a segment base specific to each
- data item in the segment. HUGE segments have no size limit other
- than the global address space limit of 1088K-16 bytes.
+II. Requirements
+----------------
-4. A PUBLIC segment can be combined with other segments of the same
- name using the same segment base.
+16-bit code relies on a combination of section types.
-5. A PRIVATE segment has a separate segment base for each translation
- unit.
+1. Sections can be NEAR, FAR, HUGE or FLAT. Orthogonally with this,
+ sections can be PRIVATE or PUBLIC.
-6. Multiple PUBLIC segments can be grouped together with a common
- segment base. This is mainly used for NEAR segments, in particular
- the standard _DATA, _BSS and _STACK segments (and, in the "tiny"
- memory model, the _TEXT segment) are usually combined in a group
- called "DGROUP".
+2. NEAR sections are addressed from a common segment base, and the
+ segment registers are generally kept at a fixed value. All NEAR
+ segments that are to be addressed from any particular common base
+ MUST NOT exceed 64K.
-Mixed-mode programming furthermore requires a way to reference any
-data item by flat linear address.
+3. FAR sections are addressed from a segment base specific to that
+ section. Any one FAR segment MUST NOT exceed 64K.
+4. HUGE sections are addressed from a segment base specific to each
+ data item in the segment. HUGE segments have no size limit other
+ than the global address space limit of 1088K-16 bytes.
-New ELF relocations
--------------------
+5. FLAT sections are typically used with 32- or 64-bit code, but MAY
+ also be used in "big real mode" (real mode where the segment limits
+ are forced to 32 bits.) A linker MAY NOT support PRIVATE FLAT
+ sections, treating them all as PUBLIC. All FLAT sections share
+ common base (usually zero, but a loader MAY choose to set a single
+ nonzero base for all FLAT sections combined; the compiler or
+ programmer MUST be aware of how the loader or operating system will
+ load these sections.) However, if 64-bit code is to be supported,
+ including "x32" code which uses 32-bit pointers while running the
+ CPU in 64-bit mode, FLAT sections MUST linked or relocated with a
+ segment base of zero.
+
+6. A PUBLIC section MUST be combined with other sections of the same
+ name such that they use the same segment base.
+
+7. A PRIVATE section SHOULD have a separate segment base for each
+ translation unit, even if named identically, except for FLAT
+ sections.
+
+8. Multiple NEAR or FAR sections MAY be grouped together with a common
+ segment base. All FLAT sections are implicitly combined in this
+ manner and MUST NOT be This is mainly used for NEAR sections, but
+ MAY be used for FAR sections if desirable; in particular the
+ standard _DATA, _BSS and _STACK segments (and, in the "tiny" memory
+ model, the _TEXT segment) are usually combined in a group called
+ "DGROUP". If specified by the compiler or programmer, the linker
+ MUST group these sections together.
+
+ Section groups MAY be either PRIVATE or PUBLIC; a PRIVATE section
+ group MUST NOT include PUBLIC sections.
+
+ Section groups are meaningless for HUGE sections and a compiler or
+ programmer SHOULD NOT attempt to assign them to a section group,
+ even if PUBLIC. All FLAT sections are implicitly grouped, and MUST
+ NOT be part of any other section group.
+
+9. If segmented protected mode support is intended, the handling of
+ HUGE sections and intersegment pointer arithmetic, if supported at
+ all, is operating environment specific, and may require conventions
+ that differ from this specification. The details of such support is
+ beyond the scope of this document.
+
+
+III. New ELF relocations
+------------------------
The following new relocations are added to the ELF i386 psABI:
-R_386_SEG16 45 word16 A + (S >> 4)
-R_386_SUB16 46 word16 A - S
-R_386_SUB32 47 word32 A - S
+R_386_SEG16 45 word16 A + (S >> 4) [*]
+R_386_SUB16 46 word16 A - S
+R_386_SUB32 47 word32 A - S
+R_386_SEGRELATIVE 48 word16 A + (B >> 4) [*]
+
+[*] If protected mode is used, these MUST be replaced by a segment
+ selection assigned by the linker or loader, as appropriate. Each
+ value generated by the relocation computation MUST be assigned a
+ unique selector.
In accordance with the ELF gABI specification, multiple relocations at
the same address are cumulative. This is essential for the SUB
@@ -60,37 +114,91 @@ relocations to work.
These are the only extensions to the ELF format proper.
+Note that ELF has a concept of a section group, using the SHT_GROUP
+section type. This is completely orthogonal to the section group
+concept used by segmented code as used in this specification. A
+program MAY use ELF section groups independently of segmentation
+section groups. These two concept MUST NOT be conflated in any way.
+
+Possible future extensions, if they end up to become necessary:
+
+1. Adding these relocations, with identical semantics, to the x86-64
+ psABI in order to support mixed mode programming which includes
+ 64-bit code.
+
+2. Some kind of analog to SHT_GROUP for segmentation section groups.
-Software conventions
---------------------
+3. Section flags to indicate 16-, 32- or 64-bit code, to aid
+ disassembly.
-1. Sections
+
+IV. Software conventions
+------------------------
+
+1. FLAT sections
+----------------
+
+A FLAT section is simply an ordinary ELF section without any special
+naming or symbols. The rest of this documentation section is
+inapplicable to FLAT sections. A compiler/assembler/programmer MAY
+choose to generate ! symbols (as specified in section 3) with an
+absolute value of 0.
+
+
+2. Sections
-----------
-A PRIVATE or HUGE segment is represented by a section without any
-special attributes. A PRIVATE or HUGE segment section must have an
-alignment of 16 or higher.
+A PRIVATE or HUGE section is simply by a section without any
+special attributes, unless PRIVATE sections are to be grouped (see
+above.) A PRIVATE or HUGE segment section MUST have an alignment of 16
+or higher.
-NOTE: using PRIVATE segments means subsections cannot be used.
+NOTE: using PRIVATE sections, unless grouped, means named subsections
+cannot be used, e.g. in order to specify a section where strings are
+to be merged.
-A PUBLIC segment is represented as a pair of sections:
+A PUBLIC or grouped PRIVATE section is represented as a pair of ELF
+sections, which MUST be named:
section!
section$
-"section!" will contain symbols but no data (see below). "section$"
-carries the actual contents of the section. The "!" section must
-have an alignment of 16 or higher, but the "$" sections MAY have any
+"section!" MUST contain symbols but no data (see below). "section$"
+carries the actual contents of the section. The "!" section MUST have
+an alignment of 16 or higher, but the "$" sections MAY have any
alignment.
-Segment groups are handled by bundling ! sections in the linker script.
+Section groups are handled by the compiler, assembler or programmer
+bundling ! sections into a common section named after the group. A
+section group section is named with a leading "!" instead of a
+trailing one; this will naturally cause them to be sorted before other
+sections; this means that unless the linker is specifically aware of
+individual groups, all groups (or all groups in an individual
+translation unit) end up being combined. For a grouped section, the
+! section specific to that $ section MAY be omitted.
+
+Segment groups MAY be given a prefix, typically shared with all
+sections that are to be included in the group. In that case, the
+prefix goes before the leading ! point. A compiler or programmer
+creating PRIVATE sections to be grouped separately from other PRIVATE
+sections SHOULD add such a prefix (which MAY be a simplified version
+of the module filename, but is not required to be.) For example, a
+module compiled for the MS-DOS "large" mode may choose to create a
+_RODATA section to be combined with the associated _DATA section,
+naming the ELF sections:
+
+ module_!DATAGROUP
+ module_RODATA! ; Optional
+ module_RODATA$
+ module_DATA! ; Optional
+ module_DATA$
These sections are named such that sorting the sections by name will
-put all the ! sections immediately before all the $ sections for
-the same segment.
+produce the correct ordering.
-When using subsection variants intended to be merged into the same
-segment, e.g. for merged strings, the compiler/assembler needs to EITHER:
+When using sections intended to be merged into the same output
+section, e.g. for merged strings, the compiler/assembler needs to
+EITHER:
a. Combine all symbols into a single ! section, without a suffix.
@@ -105,19 +213,21 @@ b. Add any suffix *after* the ! symbol.
_DATA!.strings
_DATA$.strings
-In the interest of robustness compilers/assemblers should emit !
-sections before the first associated $ section, preferably immediately
-before.
+In the interest of robustness compilers/assemblers/progammers SHOULD
+emit ! sections, including segment group sections, before any
+associated $ sections; however, this strictly falls under the
+Mr. Protocol rule of "be conservative in what you sent, liberal in
+what you accept." Linkers MUST process these sections correctly even
+if the compiler or assembler emit these in differnt order.
2. Symbols
----------
Symbols contain, as is normal in ELF, linear addresses, including the
-value of the segment base. Thus, a symbol located at 0x1234:0x5678
+value of any segment base. Thus, a symbol located at 0x1234:0x5678
will have a value of (0x1234 << 4) + 0x5678 = 0x179b8. This also means
-that flat 32-bit code can make direct use of this symbol in normal
-fashion.
+that flat code can make direct use of this symbol in normal fashion.
Each symbol is matched with an auxiliary symbol containing the
preferred segment base as a linear address. The name of the auxiliary
@@ -127,28 +237,42 @@ the example above, with foo at 0x1234:0x5678, we would have:
foo = 0x179b8
foo! = 0x12340
-For a PRIVATE segment, these auxiliary symbols are simply placed at
-the beginning of the section by the compiler/assembler.
+For an ungrouped PRIVATE segment, these auxiliary symbols are simply
+placed at the beginning of the section.
-For a PUBLIC segment, they are placed in the ! section corresponding
-to the segment (however, the primary symbol is placed in the $
-section.)
+For a PUBLIC section, they are placed in the ! section corresponding
+to the section, which MAY be a group section. However, the primary
+symbol is placed in the corresponding $ section regardlessof any
+grouping.
-For a HUGE segment, the compiler/assembler should generate the !
-symbols so that:
+For a HUGE section, the compiler/assembler/programmer SHOULD generate
+the ! symbols so that:
symbol! = symbol & ~0xf
-Undefined (external) references to these auxiliary symbols should be
+Undefined (external) references to these auxiliary ! symbols MUST be
marked WEAK. If the auxiliary symbol would contain the absolute value
0, it does not need to be emitted. This, again, simplifies mixed-mode
programming.
+A compiler/assembler/programmer MUST NOT generate spurious external
+references for a ! symbol defined in the program as a local
+symbol. One way to make that happen is to add these as local symbols
+with an absolute value of zero, as described in section 1.
-3. Use of relocations
+4 Use of relocations
---------------------
+This documentation section uses NASM syntax assembly and assumees the
+assembler has been augmented to hide details from the programmer. Some
+assemblers MAY instead choose to require the programmer or compiler to
+explicitly specify these details.
+
+Note that for an assembler to implement these features transparently,
+it needs to be aware of section types (NEAR, FAR, HUGE, FLAT, PUBLIC,
+PRIVATE) as well as section groups.
+
To access a symbol by its preferred segment base:
mov ax,SEG symbol
@@ -167,10 +291,15 @@ To access a symbol by its preferred segment base:
To access a symbol relative to a different segment base:
- mov ax,[symbol wrt DGROUP]
+ mov ax,[symbol wrt DGROUP] ; Group
+
+ R_386_16 symbol
+ R_386_SUB16 section !DGROUP + 0
+
+ mov bx,[cs:symbol wrt _TEXT] ; Section
R_386_16 symbol
- R_386_SUB16 section DGROUP! + 0
+ R_386_SUB16 section _TEXT! + 0
To access a symbol relative to the segment base of a different symbol:
@@ -193,103 +322,12 @@ To access the address of a symbol versus a fixed segment base:
R_386_16 video_rows-0x400
+R_386_SEGRELATIVE is to R_386_SEG16 what R_386_RELATIVE is to
+R_386_32; when linking PIC or PIE code, R_386_SEG16 is converted to
+R_386_SEGRELATIVE just as R_386_32 is converted to R_386_RELATIVE.
+
+
+V. Sample code
+--------------
-4. Sample linker script
------------------------
-
-This linker script is applicable to the conventional DOS memory models
-except the tiny model.
-
-SECTIONS
-{
- . = 0;
-
- far_TEXT : {
- *(SORT_BY_NAME(SORT_BY_ALIGNMENT(?*_TEXT*)))
- }
- far_DATA : {
- *(SORT_BY_NAME(SORT_BY_ALIGNMENT(?*_DATA*)))
- }
-
- _TEXT ALIGN(16) : {
- *(_START*!* _TEXT*!*)
- *(SORT_NONE(_START*))
- *(SORT_BY_ALIGNMENT(_TEXT*))
- }
-
- DGROUP (NOLOAD) ALIGN(16) : {
- *(DGROUP*!* _DATA*!* _BSS*!* _STACK*!*)
- PROVIDE(___bss_start! = .);
- PROVIDE(___bss_end! = .);
- PROVIDE(___stack_base! = .);
- PROVIDE(___stack_top! = .);
- }
-
- _DATA : {
- *(SORT_BY_ALIGNMENT(_DATA*))
- }
-
- PROVIDE(___filesize = .);
-
- _BSS : {
- PROVIDE(___bss_start = .);
- *(SORT_BY_ALIGNMENT(_BSS*) (COMMON))
- PROVIDE(___bss_end = .);
- }
-
- . = ALIGN(16);
- /* Default near stack/heap segment size, can be overridden */
- PROVIDE(___stack_size = 65536 + ADDR(DGROUP) - .);
- _STACK (NOLOAD) : {
- PROVIDE(___stack_base = .);
- . = . + ___stack_size;
- PROVIDE(___stack_top = .);
- }
-
- far_BSS ALIGN(16) : {
- PROVIDE(___farbss_start = .);
- *(SORT_BY_NAME(SORT_BY_ALIGNMENT(?*_BSS*)))
- . = ALIGN(16);
- PROVIDE(___farbss_end = .);
- }
-
- PROVIDE(___end = .);
-}
-ENTRY(__start)
-
-
-This linker script is applicable to the tiny DOS memory model.
-
-SECTIONS
-{
- . = 0;
-
- DGROUP (NOLOAD) : {
- *(*!*)
- PROVIDE(___bss_start! = .);
- PROVIDE(___bss_end! = .);
- PROVIDE(___end! = .);
- }
-
- . = 0x100;
-
- _TEXT : {
- *(SORT_NONE(_START*))
- *(SORT_BY_ALIGNMENT(_TEXT*))
- }
- _DATA : {
- *(SORT_BY_ALIGNMENT(_DATA*))
- }
-
- PROVIDE(___filesize = .);
-
- _BSS : {
- PROVIDE(___bss_start = .);
- *(SORT_BY_ALIGNMENT(_BSS*))
- *(COMMON)
- PROVIDE(___bss_end = .);
- }
-
- PROVIDE(___end = .);
-}
-ENTRY(__start)
+See https://git.zytor.com/users/hpa/segelf/samples.git.