Understand the binary package format
Learn about the MS-PPT binary file format that is used in previously released versions of Microsoft PowerPoint products, including the basic structures and key concepts for interacting with PowerPoint programmatically.
Extracting Understand the binary package format from a PowerPoint File. Retrieving Slides from PowerPoint Files. It is the part of a series of articles that introduce the binary file formats used by Microsoft Office products.
Understanding Office Binary File Formats. It starts with a Current Understand the binary package format stream, then a PowerPoint Document stream, and a Pictures stream, plus some optional streams for summary information, custom XML data, and digital signatures. The Pictures stream contains embedded images that are referenced from the slide containers they appear in. The smallest unit of data in this format is a Record. File data is stored sequentially by user edits. This means that if you want to re-create only the current version of the file, you can extract just the content of the last user edit and none of the data before it.
Likewise, you can get previous versions of the file by going to previous user understand the binary package format in the stream. These are a set of. This article series deals only with advanced scenarios, such as where Microsoft PowerPoint is not installed.
Following are the main structures of interest in a. All of the structures reside in the PowerPoint Document stream unless otherwise specified. An atom record that specifies information about the last user to modify the file and where the most recent user edit is located. This is the only record understand the binary package format the CurrentUser stream. The PersistDirectoryAtom record specifies a persist object directorywhich is a table of persist object identifiers and stream offsets to where you can find persist objects.
Each user edit stores a persist object directory that identifies where you can find any new and modified persist understand the binary package format. It specifies a compressed table of sequential persist object identifiers and stream offsets to associated persist objects.
The first 20 bits give understand the binary package format ID of a persist object. The next 12 bits give the number of persist offset entries in the currentPersistDirectoryEntry structure. The rest of the PersistDirectoryEntry structure consists of offset entries, each of which use 4 bytes. If there is more than one PersistOffsetEntry structure, each successive entry applies the Persist object together with an ID one larger than that of the previous one.
This is a container record that specifies information about the document. It includes lists of slides, notes, sounds, graphical elements, and other content. All the outline text for the current edit is stored in the Slide List of the DocumentContainer structure.
Understand the binary package format inside shapes is stored in Shapes records. This is a container record that specifies a main master slide. The main master slide defines the formatting and some content, such as template graphics, for a presentation slide. This is the container record for a slide in the presentation. It includes transition settings, header and footer information, pointers and formatting for graphic elements, and a SlideAtom structure that specifies the placeholder shapes used in the slide layout.
Each contains an 8-byte record header, followed by a series of Unicode or partial Unicode characters. This is the part of the DocumentContainer structure that specifies images, WordArt, and other graphical content. A RecordHeader structure is an 8-byte structure at the beginning of each container record and atom record in the file. It contains four fields: The last two are the most interesting. The recType field specifies the type of the current record, and the recLen field gives the length in bytes.
Extracting content from a PowerPoint document that uses the MS-PPT file format depends on the kind of content that you want to retrieve and in what condition. You can indiscriminately grab all the text from a. You could extract all of the clip art, for example, with a similar technique, although that is beyond the scope of this article. To get the content slide by slide, you have to identify the current edits by reading PersistDirectoryEntry records. In theory, you could reconstruct a slide deck with the same approach as retrieving plain text.
But if you start by building a persist object directory, you have pointers to all of the current content and no outdated content. Create a data structure, such as a dictionary, to hold two columns of understand the binary package format data.
The first column is for Persist object IDs, the second is for the offsets where they are located in the stream. Bytes 16—19 specify the offsetLastEdit field, and bytes 20—23 specify the offsetPersistDirectory field.
Populate the data structure you created by using the ID values and offsets of each PersistDirectoryEntry structure. In cases in which a PersistDirectoryEntry structure has more than one offset, assign each successive offset an ID value one larger than the previous ID in the table. Repeat the previous three steps until you understand the binary package format out of user edits.
Ignore any persist directory entries whose ID values conflict with existing ID values in the table, as these represent slides that have been overwritten. This is the DocumentContainer structure.
Within the Understand the binary package format structure, find the slide list, which is a record with rh. Read the slide list. Any record of rh. For each SlidePersistItem structure in the slide list, create a slide with the corresponding text content. You can use the same approach for other slide content, such as notes, headers and footers, and formatting information.
Save the file by using a modified file name. To arrive at previous known versions, you can continue backtracking until you reach the correct UserEditAtom structure.
By continuing to browse this site, you agree to this use. Office PowerPoint Technical Articles. Collapse the table of content. This documentation is archived and is not being maintained. June 23, Applies to: February Provided by: CurrentUserAtom An atom record that specifies information about the last user to modify the file and where the most recent user edit is located.
PersistDirectoryAtom record The PersistDirectoryAtom record specifies a persist object directorywhich is a table of persist object identifiers and stream offsets to where you can find persist objects. DocumentContainer structure This is a container record that specifies information about the document.
MainMasterContainer structure This is a container record that specifies a main master slide. SlideContainer structure This is the container record for a slide in the presentation. DrawingGroupContainer structure This is the part of the DocumentContainer structure that specifies images, WordArt, and other graphical content.
RecordHeader structure A RecordHeader structure is an 8-byte structure at the beginning of each container record and atom record in the file. Read the record header. To retrieve slides and their content from PowerPoint files Create a persist object directory. Delete everything after the previous UserEditAtom structure. Update the value of the CurrentUserAtom. For more information, see the following resources: Is this page helpful?
This chapter touches on some lower level internals of Debian package management. Packages generally contain all of the files necessary to implement a set of related commands or features.
There are two types of Debian packages:. These packages are distributed in a Debian-specific archive format see What is the format of a Debian binary package? Binary packages can be unpacked using the Debian utility dpkg possibly via a frontend like aptitude ; details are given in its manual page. Source packageswhich consist of a. The utility dpkg-source packs and unpacks Debian source archives; details are provided in its manual page.
The program apt-get can be used as a frontend for dpkg-source. Installation of software by the package system uses "dependencies" which are carefully designed by the package maintainers. These dependencies are documented in the control file associated with each package. For example, the package containing the GNU C compiler gcc "depends" on the package binutils which includes the linker and assembler. If a user attempts to install gcc without having first installed binutilsthe package management system dpkg will send an error message that it also needs binutilsand stop installing gcc.
However, this facility can be overridden by the insistent user, see dpkg 8. A Debian "package", or a Debian archive file, contains the executable files, libraries, and documentation associated with a particular suite of program or set of related programs.
Normally, a Debian archive file has a filename that ends in. The internals of this Debian binary packages format are described in the deb 5 manual page. The Debian binary package file names conform to the following convention: Note that foo is supposed to be the package name. Checking the package name associated with a particular Debian archive file. This file contains a stanza describing each package; the first field in each stanza is the formal package name. This displays, among other things, the package name corresponding to the archive file being unpacked.
The VVV component is the version number specified by the upstream developer. There are no standards in place here, so the version number may have formats as different as "" and "1. The RRR component is the Debian revision number, and is specified by the Debian developer or an individual user if he chooses to build the package himself.
The AAA component identifies the processor for understand the binary package format the package was built. For details, see the description of "Debian architecture" in the manual page dpkg-architecture 1. Specifics regarding the contents of a Debian control file are provided in the Debian Policy Manual, section 5, see What other documentation exists on and for a Debian system?
The Package field gives the package name. This is the name by which the package can be manipulated by the package tools, and usually similar to but not necessarily the same as the first component string in the Debian archive file name. The Version field gives both the upstream developer's version number and in the last component the revision level of the Debian package of this program as explained in Why are Debian package file names so long? The Depends field gives a list of packages that have to be installed in order to install this package successfully.
The Installed-Size indicates how much disk space the installed package will consume. This is intended to be used by installation front-ends in order to show whether there is enough disk space available to install the program. The Priority indicates understand the binary package format important is this package for installation, so understand the binary package format semi-intelligent understand the binary package format like apt or aptitude can sort the package into a category of e.
The Maintainer field gives the e-mail address of the person who is currently responsible for maintaining this package. For more information about all possible fields a package can have, please see understand the binary package format Debian Policy Manual, section 5, "Control files and their fields", see What other documentation exists on understand the binary package format for a Debian system?
This ensures that local values for the contents of these files will be preserved, and is a critical feature enabling the in-place upgrade of packages on a running system. These files are executable scripts which are automatically run before or after a package is installed or removed. Along with a file named controlall of these files are part of the "control" section of a Debian archive file.
This script is executed before the package it belongs to is unpacked from its Debian archive ". Many 'preinst' scripts stop services for packages which are being upgraded until their installation or upgrade is completed following the successful execution of the 'postinst' script. This script typically completes any required configuration of the package foo once foo has been unpacked from its Debian archive ".
Many 'postinst' scripts then execute any commands necessary to start or restart a service once a new package has been installed or upgraded. This script typically stops any daemons which are associated with a package.
It is executed before the removal of files associated with the package. Also see What is a Virtual Package? The files relevant to package foo begin with the name "foo" and have file extensions of "preinst", "postinst", etc. Note that the location of these understand the binary package format is a dpkg internal; you should not rely on it.
Each Debian package is assigned a priority by the distribution maintainers, as an aid to the package management system. This includes all tools that are necessary to repair system defects.
You must not remove these packages or your system may become totally broken and you may probably not even be able to use dpkg to put things back. Systems with only the Required packages are probably unusable, but they do have enough functionality to allow the sysadmin to boot and install more software.
Other understand the binary package format which the system will not run well or be usable without will be here. These packages only constitute the bare infrastructure. Standard packages are standard on any Linux system, including a reasonably small but not too limited character-mode system. Tools are included to be able to send e-mail with mutt and download files from FTP servers.
This is what will be installed by default if understand the binary package format do not select anything else. It does not include many large applications, but it does include the Python interpreter and some server software like OpenSSH for remote administration and Exim for mail delivery, although it can be configured for local delivery only. It also includes some common generic documentation that most users will find helpful. Optional packages include all those that you might reasonably want to install if you do not know what they are, or that do not have specialized requirements.
If you do a default Debian installation all the packages of priority Standard or higher will be installed in your system. If you select pre-defined tasks you will get lower priority packages too. Additionally, some packages are marked as Essential since they are absolutely necessary for understand the binary package format proper functioning of the system.
The package management tools will refuse to remove these. A virtual package is a generic name that applies to any one of a group of packages, all of which provide similar basic functionality. For example, both the konqueror and firefox-esr programs are web browsers, and should therefore satisfy any dependency of a program that requires a web browser on a system, in order to work or to be useful. They are therefore both said to provide the "virtual package" called www-browser.
Similarly, exim4 and sendmail both provide the functionality of a mail transport agent. They are therefore said to provide the virtual package "mail-transport-agent". If either one is installed, then any program depending on the installation of a mail-transport-agent will be satisfied by the presence of this virtual package.
Debian provides a mechanism so that, if more than understand the binary package format package which provide the same virtual package is installed on a system, then system administrators can set one as the preferred package. The relevant command is update-alternativesand is described further in Some users like mawk, others like gawk; some like vim, others like elvis; some like trn, others like tin; how does Debian support diversity?
The Debian package system has a range of package "dependencies" which are designed to indicate in a single flag the level at which Program A can operate independently of the existence of Program B on a given system:.
In some cases, A depends not only on B, but on a version of B. In this case, the version dependency is usually understand the binary package format lower limit, in the sense that A depends on any version of Understand the binary package format more recent than some specified version. Package A recommends Package B, if the package maintainer judges that most users would not want A without also having the functionality provided by B. Package A suggests Package B if B contains files that are related to and usually enhance the functionality of A.
Most often, conflicts are cases where A contains files which are an improvement over those in B. Package A replaces Package B when files installed by B are removed and in some cases over-written by files in A.
Package A breaks Package B when both packages cannot be simultaneously configured in a system. The package management system will refuse to install one if the other one is already installed and configured in the system. This mechanism provides a understand the binary package format for users with constrained disk space to get only that part of package A which they really need. More detailed information on the use of each of these understand the binary package format can be found in the Debian Policy manual, section 7.
In the case of most packages, dpkg will unpack the archive file of a package i. Simplistically, unpacking means that dpkg will extract the files from the archive file that were meant to be installed on your file system, and put them in place. If those packages depend on the existence of some other packages on your system, dpkg will refuse to complete the installation by executing its "configure" action until the other packages are installed. However, for some packages, dpkg will refuse even to unpack them until certain understand the binary package format are resolved.
Such packages are said to "Pre-depend" on the presence of some other packages. The Debian project provided this mechanism to support the safe upgrading of systems from a. There are other large upgrade situations where this method is useful, e. Then edit the resulting file selections. Debian source packages can't understand the binary package format be "installed", they are just unpacked in whatever directory you want to build the binary packages they produce. Source packages are distributed on most of the same mirrors where you can obtain the binary packages.
If you set up your APT's sources. To help you in actually building the source package, Debian source packages provide the so-called build-dependencies mechanism.
This means that the source package maintainer keeps a list of other packages that are required to build their package.
To see how this is useful, run. The preferred way to do this is by using various wrapper tools. We'll show how it's done using the devscripts tools.
A binary file is a computer file that is not a text file. Binary files are usually thought of as being a sequence of byteswhich means the binary digits bits are grouped in eights. Binary files typically contain bytes that are intended to be interpreted as something other than text characters. Compiled computer programs are typical examples; indeed, compiled applications are sometimes referred to, particularly by programmers, as binaries. But binary files can also mean that they contain images, sounds, compressed versions of other files, etc.
Some binary files contain headersblocks of metadata used by a computer program to interpret the data in the file. The header often contains a signature or magic number which can identify the format. For example, a GIF file can contain multiple images, and headers are used to identify and describe each block of image data.
If a binary file does not contain any headers, it may be called a flat binary file. To send binary files through certain systems such as email that do not allow all data values, they are often translated into a plain text representation using, for example, Base The increased size may be countered by lower-level link compression, as the resulting text data will have about as much less entropy as it has increased size, so the actual data transferred in this scenario would likely be very close to the size of the original binary data.
See Binary-to-text encoding for more on this subject. A hex understand the binary package format or viewer may be used to view file data as a sequence of hexadecimal or decimal, binary or ASCII character values for corresponding bytes of a binary file. If a binary file is opened in a text editoreach group of eight bits will typically be translated as a single character, and the user will see a probably unintelligible display of textual characters. If the file is opened in some other application, that application will have its own use for each byte: Other type of viewers called 'word extractors' simply replace the unprintable characters with spaces revealing only the human-readable text.
This type of view is useful for understand the binary package format inspection of a binary file in order to find passwords in games, find hidden text in non-text files and recover corrupted documents. If the file is itself treated as an executable and run, then the operating system will attempt to interpret the file understand the binary package format a series of instructions in its machine language.
Standards are very important to binary files. For example, a binary file interpreted by the ASCII character set will result in text being displayed.
A custom application can interpret the file differently: Binary itself is meaningless, until such time as an executed algorithm defines what should be done with each bit, byte, understand the binary package format or block.
Thus, just examining the binary and attempting to match it against known formats can lead to the wrong conclusion as to what it actually represents. This fact can be used in steganographywhere an algorithm interprets a binary data file differently to reveal hidden content. Without the algorithm, it is impossible to tell that hidden content exists.
Two files that are binary compatible will have the same sequence of zeros and ones in the data portion of the file. The file header, however, may be different. The term is used most commonly to state that data files produced by one application are exactly the same as data files understand the binary package format by another application. For example, some software companies produce understand the binary package format for Windows and the Macintosh that are binary compatible, which means that a file produced in a Windows environment is interchangeable with a file understand the binary package format on a Macintosh.
This avoids many of the conversion problems understand the binary package format by importing and exporting data. One possible binary compatibility issue between different computers is the endianness of the computer. Some computers store the bytes in a file in a different order.
From Wikipedia, the free encyclopedia. For double stars, see Binary star. For the CD image format, see Disk image.
This article does not cite any sources. Please help improve this article understand the binary package format adding citations to reliable sources. Unsourced material may be challenged and removed. April Learn how and when to remove this template message. For binary code executable file compatibility, see Binary compatible.
Open Close Read Write. File comparison File compression File manager Comparison of file managers File system permissions File transfer File sharing File verification. Retrieved from " https: Articles lacking sources from April All articles lacking sources.