Summary:
The LLVM Bitcode File Format documentation states that all bitstreams
begin with the magic number 'BC', and that generic bitstream analyzer
tools may check for this number in order to determine whether the
stream is a bitstream.
However, in practice:
* Only LLVM IR bitcode begins with 'BC'. Other bitstreams -- Clang
AST files and precompiled headers, Clang serialized diagnostics,
Swift modules -- do not start with 'BC'. A tool that actually checked
for 'BC' would only be able to recognize LLVM IR.
* The `llvm-bcanalyzer`, arguably the most used generic bitstream
analyzer tool, does not check for a magic number 'BC' (except to
determine whether the file is LLVM IR).
Update the bitcode format documentation to make it clear that not all
bitstreams begin with 'BC', and that tools should not rely on that
particular magic number value.
Test Plan:
Build the `docs-llvm-html` target and confirm the changes render in
a Safari web browser.
Reviewers: harlanhaskins, eugenis, mehdi_amini, pcc, angerman
Reviewed By: angerman
Subscribers: angerman, llvm-commits
Differential Revision: https://reviews.llvm.org/D42002
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@322520
91177308-0d34-0410-b5e6-
96231b3b80d8
Magic Numbers
-------------
-The first two bytes of a bitcode file are 'BC' (``0x42``, ``0x43``). The second
-two bytes are an application-specific magic number. Generic bitcode tools can
-look at only the first two bytes to verify the file is bitcode, while
-application-specific programs will want to look at all four.
+The first four bytes of a bitstream are used as an application-specific magic
+number. Generic bitcode tools may look at the first four bytes to determine
+whether the stream is a known stream type. However, these tools should *not*
+determine whether a bitstream is valid based on its magic number alone. New
+application-specific bitstream formats are being developed all the time; tools
+should not reject them just because they have a hitherto unseen magic number.
.. _primitives:
The magic number for LLVM IR files is:
:raw-html:`<tt><blockquote>`
-[0x0\ :sub:`4`, 0xC\ :sub:`4`, 0xE\ :sub:`4`, 0xD\ :sub:`4`]
+['B'\ :sub:`8`, 'C'\ :sub:`8`, 0x0\ :sub:`4`, 0xC\ :sub:`4`, 0xE\ :sub:`4`, 0xD\ :sub:`4`]
:raw-html:`</blockquote></tt>`
-When combined with the bitcode magic number and viewed as bytes, this is
-``"BC 0xC0DE"``.
-
.. _Signed VBRs:
Signed VBRs