Variable length quantity, unsigned/signed integer, base128, little-endian: GraphViz block diagram (.dot) source

A variable-length unsigned/signed integer using base128 encoding. 1-byte groups consist of 1-bit flag of continuation and 7-bit value chunk, and are ordered "least significant group first", i.e. in "little-endian" manner.

This particular encoding is specified and used in:

More information on this encoding is available at https://en.wikipedia.org/wiki/LEB128

This particular implementation supports integer values up to 64 bits (i.e. the maximum unsigned value supported is 2**64 - 1), which implies that serialized values can be up to 10 bytes in length.

If the most significant 10th byte (groups[9]) is present, its has_next must be false (otherwise we would have 11 or more bytes, which is not supported) and its value can be only 0 or 1 (because a 9-byte VLQ can represent 9 * 7 = 63 bits already, so the 10th byte can only add 1 bit, since only integers up to 64 bits are supported). These restrictions are enforced by this implementation. They were inspired by the Protoscope tool, see https://github.com/protocolbuffers/protoscope/blob/8e7a6aafa2c9958527b1e0747e66e1bfff045819/writer.go#L644-L648.

KS implementation details

License: CC0-1.0
Minimal Kaitai Struct required: 0.10

References

This page hosts a formal specification of Variable length quantity, unsigned/signed integer, base128, little-endian using Kaitai Struct. This specification can be automatically translated into a variety of programming languages to get a parsing library.

GraphViz block diagram source

vlq_base128_le.dot

digraph {
	rankdir=LR;
	node [shape=plaintext];
	subgraph cluster__vlq_base128_le {
		label="VlqBase128Le";
		graph[style=dotted];

		vlq_base128_le__seq [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
			<TR><TD BGCOLOR="#E0FFE0">pos</TD><TD BGCOLOR="#E0FFE0">size</TD><TD BGCOLOR="#E0FFE0">type</TD><TD BGCOLOR="#E0FFE0">id</TD></TR>
			<TR><TD PORT="groups_pos">0</TD><TD PORT="groups_size">1</TD><TD>Group</TD><TD PORT="groups_type">groups</TD></TR>
			<TR><TD COLSPAN="4" PORT="groups__repeat">repeat until !(_.has_next)</TD></TR>
		</TABLE>>];
		vlq_base128_le__inst__len [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
			<TR><TD BGCOLOR="#E0FFE0">id</TD><TD BGCOLOR="#E0FFE0">value</TD></TR>
			<TR><TD>len</TD><TD>groups.length</TD></TR>
		</TABLE>>];
		vlq_base128_le__inst__sign_bit [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
			<TR><TD BGCOLOR="#E0FFE0">id</TD><TD BGCOLOR="#E0FFE0">value</TD></TR>
			<TR><TD>sign_bit</TD><TD>(len == 10 ? 9223372036854775808 : groups.last.multiplier * 64)</TD></TR>
		</TABLE>>];
		vlq_base128_le__inst__value [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
			<TR><TD BGCOLOR="#E0FFE0">id</TD><TD BGCOLOR="#E0FFE0">value</TD></TR>
			<TR><TD>value</TD><TD>groups.last.interm_value</TD></TR>
		</TABLE>>];
		vlq_base128_le__inst__value_signed [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
			<TR><TD BGCOLOR="#E0FFE0">id</TD><TD BGCOLOR="#E0FFE0">value</TD></TR>
			<TR><TD>value_signed</TD><TD>( ((sign_bit &gt; 0) &amp;&amp; (value &gt;= sign_bit))  ? -((sign_bit - (value - sign_bit))) : value)</TD></TR>
		</TABLE>>];
		subgraph cluster__group {
			label="VlqBase128Le::Group";
			graph[style=dotted];

			group__seq [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
				<TR><TD BGCOLOR="#E0FFE0">pos</TD><TD BGCOLOR="#E0FFE0">size</TD><TD BGCOLOR="#E0FFE0">type</TD><TD BGCOLOR="#E0FFE0">id</TD></TR>
				<TR><TD PORT="has_next_pos">0</TD><TD PORT="has_next_size">1b</TD><TD>b1be→bool</TD><TD PORT="has_next_type">has_next</TD></TR>
				<TR><TD COLSPAN="4" PORT="has_next__valid">must be equal to (idx == 9 ? false : has_next)</TD></TR>
				<TR><TD PORT="value_pos">0:1</TD><TD PORT="value_size">7b</TD><TD>b7be</TD><TD PORT="value_type">value</TD></TR>
				<TR><TD COLSPAN="4" PORT="value__valid">must be at most (idx == 9 ? 1 : 127)</TD></TR>
			</TABLE>>];
			group__inst__interm_value [label=<<TABLE BORDER="0" CELLBORDER="1" CELLSPACING="0">
				<TR><TD BGCOLOR="#E0FFE0">id</TD><TD BGCOLOR="#E0FFE0">value</TD></TR>
				<TR><TD>interm_value</TD><TD>(prev_interm_value + value * multiplier)</TD></TR>
			</TABLE>>];
		}
	}
	vlq_base128_le__seq:groups_type -> group__seq [style=bold];
	group__seq:has_next_type -> vlq_base128_le__seq:groups__repeat [color="#404040"];
	vlq_base128_le__seq:groups_type -> vlq_base128_le__inst__len [color="#404040"];
	vlq_base128_le__inst__len:len_type -> vlq_base128_le__inst__sign_bit [color="#404040"];
	group__params:multiplier_type -> vlq_base128_le__inst__sign_bit [color="#404040"];
	group__inst__interm_value:interm_value_type -> vlq_base128_le__inst__value [color="#404040"];
	vlq_base128_le__inst__sign_bit:sign_bit_type -> vlq_base128_le__inst__value_signed [color="#404040"];
	vlq_base128_le__inst__value:value_type -> vlq_base128_le__inst__value_signed [color="#404040"];
	group__params:idx_type -> group__seq:has_next__valid [color="#404040"];
	group__seq:has_next_type -> group__seq:has_next__valid [color="#404040"];
	group__params:idx_type -> group__seq:value__valid [color="#404040"];
	group__params:prev_interm_value_type -> group__inst__interm_value [color="#404040"];
	group__seq:value_type -> group__inst__interm_value [color="#404040"];
	group__params:multiplier_type -> group__inst__interm_value [color="#404040"];
}