Python pickle serialization format: Lua parsing library

Python Pickle format serializes Python objects to a byte stream, as a sequence of operations to run on the Pickle Virtual Machine.

The format is mostly implementation defined, there is no formal specification. Pickle data types are closely coupled to the Python object model. Python singletons, and most builtin types (e.g. None, int,dict, list) are serialised using dedicated Pickle opcodes. Other builtin types, and all classes (e.g. set, datetime.datetime) are serialised by encoding the name of a constructor callable. They are deserialised by importing that constructor, and calling it. So, unpickling an arbitrary pickle, using the Python's stdlib pickle module can cause arbitrary code execution.

Pickle format has evolved with Python, later protocols add opcodes & types. Later Python releases can pickle to or unpickle from any earlier protocol.

  • Protocol 0: ASCII clean, no explicit version, fields are '\n' terminated.
  • Protocol 1: Binary, no explicit version, first length prefixed types.
  • Protocol 2: Python 2.3+. Explicit versioning, more length prefixed types. https://www.python.org/dev/peps/pep-0307/
  • Protocol 3: Python 3.0+. Dedicated opcodes for bytes objects.
  • Protocol 4: Python 3.4+. Opcodes for 64 bit strings, framing, set. https://www.python.org/dev/peps/pep-3154/

Application

Python

File extension

["pickle", "pkl"]

KS implementation details

License: CC0-1.0

References

This page hosts a formal specification of Python pickle serialization format using Kaitai Struct. This specification can be automatically translated into a variety of programming languages to get a parsing library.

Lua source code to parse Python pickle serialization format