Portable Compiled Format (PCF) font: format specification

Portable Compiled Format (PCF) font is a bitmap font format originating from X11 Window System. It matches BDF format (which is text-based) closely, but instead being binary and platform-independent (as opposed to previously used SNF binary format) due to introduced features to handle different endianness and bit order.

The overall composition of the format is straightforward: it's more or less classic directory of type-offset-size pointers, pointing to what PCF format calls "tables". Each table carries a certain piece of information related to the font (metadata properties, metrics, bitmaps, mapping of glyphs to characters, etc).

File extension

pcf

KS implementation details

Minimal Kaitai Struct required: 0.9

References

This page hosts a formal specification of Portable Compiled Format (PCF) font using Kaitai Struct. This specification can be automatically translated into a variety of programming languages to get a parsing library.

Block diagram

Format specification in Kaitai Struct YAML

meta:
  id: pcf_font
  title: Portable Compiled Format (PCF) font
  file-extension: pcf
  xref:
    justsolve: PCF
    wikidata: Q3398726
  imports:
    - /common/bytes_with_io
  ks-version: 0.9
  encoding: UTF-8
  endian: le
doc: |
  Portable Compiled Format (PCF) font is a bitmap font format
  originating from X11 Window System. It matches BDF format (which is
  text-based) closely, but instead being binary and
  platform-independent (as opposed to previously used SNF binary
  format) due to introduced features to handle different endianness
  and bit order.

  The overall composition of the format is straightforward: it's more
  or less classic directory of type-offset-size pointers, pointing to
  what PCF format calls "tables". Each table carries a certain
  piece of information related to the font (metadata properties,
  metrics, bitmaps, mapping of glyphs to characters, etc).
doc-ref: https://fontforge.org/docs/techref/pcf-format.html
seq:
  - id: magic
    -orig-id: header
    contents: [0x1, "fcp"]
  - id: num_tables
    -orig-id: table_count
    type: u4
  - id: tables
    type: table
    repeat: expr
    repeat-expr: num_tables
types:
  table:
    doc: |
      Table offers a offset + length pointer to a particular
      table. "Type" of table references certain enum. Applications can
      ignore enum values which they don't support.
    seq:
      - id: type
        type: u4
        enum: types
      - id: format
        type: format
      - id: len_body
        -orig-id: size
        type: u4
      - id: ofs_body
        -orig-id: offset
        type: u4
    instances:
      body:
        pos: ofs_body
        size: len_body
        type:
          switch-on: type
          cases:
            'types::properties': properties
            # TODO: accelerators
            # TODO: metrics
            # TODO: ink_metrics
            'types::bitmaps': bitmaps
            'types::bdf_encodings': bdf_encodings
            'types::swidths': swidths
            'types::glyph_names': glyph_names
            # TODO: bdf_accelerators
    -webide-representation: "{type}"
    types:
      properties:
        doc: |
          Array of properties (key-value pairs), used to convey different X11
          settings of a font. Key is always an X font atom.
        doc-ref: 'https://fontforge.org/docs/techref/pcf-format.html#properties-table'
        seq:
          - id: format
            type: format
          - id: num_props
            type: u4
          - id: props
            type: prop
            repeat: expr
            repeat-expr: num_props
          - id: padding
            size: '(num_props & 3) == 0 ? 0 : (4 - (num_props & 3))'
            # In reality: align to next 4-byte boundary
          - id: len_strings
            type: u4
          - id: strings
            size: len_strings
            type: bytes_with_io
            doc: |
              Strings buffer. Never used directly, but instead is
              addressed by offsets from the properties.
        -webide-representation: "{num_props:dec} properties"
        types:
          prop:
            doc: |
              Property is a key-value pair, "key" being always a
              string and "value" being either a string or a 32-bit
              integer based on an additinal flag (`is_string`).

              Simple offset-based mechanism is employed to keep this
              type's sequence fixed-sized and thus have simple access
              to property key/value by index.
            seq:
              - id: ofs_name
                type: u4
                doc: Offset to name in the strings buffer.
              - id: is_string
                -orig-id: isStringProp
                type: u1
                doc: |
                  Designates if value is an integer (zero) or a string (non-zero).
              - id: value_or_ofs_value
                type: u4
                doc: |
                  If the value is an integer (`is_string` is false),
                  then it's stored here. If the value is a string
                  (`is_string` is true), then it stores offset to the
                  value in the strings buffer.
            instances:
              name:
                pos: ofs_name
                io: _parent.strings._io
                type: strz
                doc: |
                  Name of the property addressed in the strings buffer.
                -webide-parse-mode: eager
              str_value:
                pos: value_or_ofs_value
                io: _parent.strings._io
                type: strz
                if: is_string != 0
                doc: |
                  Value of the property addressed in the strings
                  buffer, if this is a string value.
                -webide-parse-mode: eager
              int_value:
                value: value_or_ofs_value
                if: is_string == 0
                doc: |
                  Value of the property, if this is an integer value.
                -webide-parse-mode: eager

              # As of Kaitai Struct 0.9, `value` fails with:
              #  
              #     "can't combine output types: StrFromBytesType(BytesTerminatedType(0,false,true,true,None),UTF-8) vs IntMultiType(false,Width4,Some(LittleEndian))"
              #
              # ... so currently it's commented out.

              #value:
              #  value: '(is_string != 0) ? str_value : int_value'
            -webide-representation: "{name} => {str_value}/{int_value}"
      bitmaps:
        doc: |
          Table containing uncompressed glyph bitmaps.
        doc-ref: 'https://fontforge.org/docs/techref/pcf-format.html#the-bitmap-table'
        seq:
          - id: format
            type: format
          - id: num_glyphs
            -orig-id: glyph_count
            type: u4
          - id: offsets
            type: u4
            repeat: expr
            repeat-expr: num_glyphs
          - id: bitmap_sizes
            type: u4
            repeat: expr
            repeat-expr: 4
      bdf_encodings:
        doc: |
          Table that allows mapping of character codes to glyphs present in the
          font. Supports 1-byte and 2-byte character codes.

          Note that this mapping is agnostic to character encoding itself - it
          can represent ASCII, Unicode (ISO/IEC 10646), various single-byte
          national encodings, etc. If application cares about it, normally
          encoding will be specified in `properties` table, in the properties named
          `CHARSET_REGISTRY` / `CHARSET_ENCODING`.
        doc-ref: 'https://fontforge.org/docs/techref/pcf-format.html#the-encoding-table'
        seq:
          - id: format
            type: format
          - id: min_char_or_byte2
            type: u2
          - id: max_char_or_byte2
            type: u2
          - id: min_byte1
            type: u2
          - id: max_byte1
            type: u2
          - id: default_char
            type: u2
          - id: glyph_indexes
            -orig-id: glyphindeces
            type: u2
            repeat: expr
            repeat-expr: (max_char_or_byte2 - min_char_or_byte2 + 1) * (max_byte1 - min_byte1 + 1)
      swidths:
        doc: |
          Table containing scalable widths of characters.
        doc-ref: 'https://fontforge.org/docs/techref/pcf-format.html#the-scalable-widths-table'
        seq:
          - id: format
            type: format
          - id: num_glyphs
            type: u4
          - id: swidths
            type: u4
            repeat: expr
            repeat-expr: num_glyphs
            doc: |
              The scalable width of a character is the width of the corresponding
              PostScript character in em-units (1/1000ths of an em).
        -webide-representation: '{num_glyphs:dec} glyphs'
      glyph_names:
        doc: |
          Table containing character names for every glyph.
        doc-ref: 'https://fontforge.org/docs/techref/pcf-format.html#the-glyph-names-table'
        seq:
          - id: format
            type: format
          - id: num_glyphs
            type: u4
          - id: names
            type: string_ref
            repeat: expr
            repeat-expr: num_glyphs
            doc: |
              Glyph names are represented as string references in strings buffer.
          - id: len_strings
            -orig-id: string_size
            type: u4
          - id: strings
            size: len_strings
            type: bytes_with_io
            doc: |
              Strings buffer which contains all glyph names.
        -webide-representation: '{num_glyphs:dec} glyphs'
        types:
          string_ref:
            seq:
              - id: ofs_string
                type: u4
            instances:
              value:
                pos: ofs_string
                io: _parent.strings._io
                type: strz
                -webide-parse-mode: eager
            -webide-representation: '{value}'
  format:
    doc: |
      Table format specifier, always 4 bytes. Original implementation treats
      it as always little-endian and makes liberal use of bitmasking to parse
      various parts of it.

      TODO: this format specification recognizes endianness and bit
      order format bits, but it does not really takes any parsing
      decisions based on them.
    doc-ref: 'https://fontforge.org/docs/techref/pcf-format.html#file-header'
    seq:
      - id: padding1
        type: b2
      - id: scan_unit_mask
        type: b2
      - id: is_most_significant_bit_first
        -orig-id: PCF_BYTE_MASK
        type: b1
      - id: is_big_endian
        -orig-id: PCF_BYTE_MASK
        type: b1
        doc: If set, then all integers in the table are treated as big-endian
      - id: glyph_pad_mask
        type: b2
        -orig-id: PCF_GLYPH_PAD_MASK
      - id: format
        type: u1
      - id: padding
        type: u2
enums:
  types:
    1:
      id: properties
      -orig-id: PCF_PROPERTIES
    2:
      id: accelerators
      -orig-id: PCF_ACCELERATORS
    4:
      id: metrics
      -orig-id: PCF_METRICS
    8:
      id: bitmaps
      -orig-id: PCF_BITMAPS
    0x10:
      id: ink_metrics
      -orig-id: PCF_INK_METRICS
    0x20:
      id: bdf_encodings
      -orig-id: PCF_BDF_ENCODINGS
    0x40:
      id: swidths
      -orig-id: PCF_SWIDTHS
    0x80:
      id: glyph_names
      -orig-id: PCF_GLYPH_NAMES
    0x100:
      id: bdf_accelerators
      -orig-id: PCF_BDF_ACCELERATORS