In this thread, one reader asked how Macintosh file metadata (e.g. a file's type and creator codes) are preserved when a file is copied to non-Macintosh storage (e.g. a Windows file server or a FAT formatted hard drive). He observed that he can copy a file to a Windows server, then copy/move the file to several different locations from Windows, and then copy the file back to a Mac and the metadata is preserved.
Here is the result of my analysis.
Since the first days of the Macintosh, files were never as simple as they are on other computers. In addition to "standard" metadata like a file name, size and basic permissions (e.g. read-only, hidden), Mac files almost always contain additional metadata including:
- File type. A 4-byte number (usually expressed as four text characters) identifying the type of data stored in the file. For example, "TEXT" for text files, "WXBN" for Microsoft Word XML files (aka "docx"), "FMP7" for FileMaker Pro version 7 and "MPG3" for MP3 audio.
- Creator. A 4-byte number (usually expressed as four text characters) identifying the application that created the file. For example, "ttxt" for the old "TeachText" text editor and "MSWD" for Microsoft Word.
- Dates. For creation and last modification.
- Icon. How the document is presented on your desktop. An icon will be normally be taken from its associated application, but it can be overridden on a per-document basis.
- Locked. Makes the file read-only. On modern versions of macOS, this is distinct from the Unix-level file permissions.
- Stationery pad. If this flag is set then the document is treated as a template. Opening it will cause the application to create a new document with the file's content.
- Comments. Users can type arbitrary text, which is stored with the file.
Additionally, Mac file systems support "forks". These are additional streams of data that are stored along with the file's normal content. Originally, macOS only supported two forks: the data fork and the resource fork. The data fork is considered a file's main content (and on other operating systems that don't support forks, it is the entire file content).
The resource fork is a sort of database to store "resources" - objects like bitmaps, icons, fonts, string tables, and other kinds of support data. Resources are indexed by a type (a 4-byte number represented as four text characters) and a numeric index. Classic MacOS (versions 0 through 9) even use resources to store an application's executable code, as a set of "CODE" resources stored in the application's resource fork.
Today, much of this is vestigial. Modern Mac applications don't use the resource fork (they use a completely different mechanism, beyond the scope of this article for accessing resources).
Even the type/creator codes are barely used today. Modern Mac systems identify a file's type based on an extension to its filename, just as other operating systems do. The system associates an application with a file type, which will be launched when the user double-clicks on a document's icon, unless the user manually override that default using the Finder. But the codes are still used if the file name has no extension or if the extension is unknown - in which case, the default application will be the one that created the file (identified by its creator code), and if that application is unavailable, any other application that advertises support for the file type.
But Finder metadata isn't completely unused. MacOS stores other kinds of metadata (e.g. a "quarantine" flag to identify files that should be checked for malware before opening) with files, and it does this using "extended attributes". These attributes are stored as Finder metadata and in secondary forks (not the legacy resource fork, but other ones created for the purpose). Because this metadata is still important, it should be preserved when the file is copied to non-Mac systems, and if you try it, you will see that it is preserved. But how is this done?
Disk images, archive files and special encodings
In the early days of the Macintosh, it was the user's responsibility to make sure that Finder metadata, the resource fork and other similar data would be preserved when copying files to non-Mac systems. Simply copying a file to a non-Mac system would typically result in the loss of this data, the result of which could be catastrophic (e.g. an application, whose content consists almost entirely of resoruces is the resource fork). For this reasons, various text and binary encodings were invented in order to preserve this data, and Mac users (especially those exchanging files over BBSs and the Internet) were expected to run application software in order encode/decode these formats. Some popular examples include:
- BinHex (.hqx). A text-based format that encodes all of a Mac file's data in a text-only format suitable for exchanging via e-mail and other 7-bit communication interfaces (e.g. the early CompuServe network).
- MacBinary. A binary format that combines a file's data fork, resource fork and Finder metadata into a single "normal" file that can be copied to/from non-Mac computers.
- AppleSingle. Another binary format for combining files. This one was invented by Apple and solves some technical problems with the MacBinary format.
- AppleDouble. A binary format where the file is stored as two files. The data fork content is stored in one and all other data (Finder info, resource fork, etc.) is stored in another. It was invented for Apple's A/UX Unix platform, to allow Mac files to be stored on a Unix file system in a way they are usable by Unix applications, but without losing Mac-specific content.
- StuffIt. A very popular commercial (originally shareware) data compression system designed for MacOS, and therefore fully supporting Mac file system data.
Mac software has also been (and continues to be) frequently distributed in the form of disk images. These are (data-fork-only) files that contain an image of a complete file system. Images of Mac file systems (e.g. HFS and HFS+) can natively support all of a Mac file's forks and metadata. These are very popular for distributing collections of documents and application installers.
Finally, since Mac OS X 10.3 ("Panther"), Apple started supporting the use of Zip archives for Mac files, including easy integration with the Finder. If Mac files are zipped and unzipped using the Finder, Mac metadata should be archived along with the rest of the content.
Copying Mac files to non-Mac storage devices
With the above background, we can now understand what modern versions of macOS do when you copy a Mac file to a storage device that doesn't support Mac file structures (e.g. a FAT-formatted storage volume).
MacOS uses a variation on the legacy AppleDouble format. When a file is copied to a non-Mac volume (or if a Mac app creates/edits a file on such a volume), two files are created. The first (having the file's normal name) contains its data fork. The second (having a name identical to the first, but prefixed with ._) contains Finder info and all other forks (resource and otherwise).
If you use the Mac GetFileInfo command from a Terminal session, you can see the file's Finder info. If you run it against a file on a Mac storage volume, you might see something like:
And if you copy that file to a FAT-formatted volume, you will see the same thing:$ GetFileInfo foo.docx file: "/Users/.../foo.docx" type: "WXBN" creator: "MSWD" attributes: avbstclinmedz created: 08/21/2022 18:24:31 modified: 08/21/2022 18:24:31
$ cd "/Volumes/FAT" $ GetFileInfo foo.docx file: "/Volumes/FAT/foo.docx" type: "WXBN" creator: "MSWD" attributes: avbstclinmedz created: 08/21/2022 18:24:31 modified: 08/21/2022 18:24:31
So the metadata is being preserved. And it is being stored in the file's corresponding ._ file. Which you can see if you perform a hex-dump on its content:
$ ls -la ._* -rwxrwxrwx 1 ... staff 4096 Aug 21 18:29 ._foo.docx $ hexdump -C ._foo.docx 00000000 00 05 16 07 00 02 00 00 4d 61 63 20 4f 53 20 58 |........Mac OS X| 00000010 20 20 20 20 20 20 20 20 00 02 00 00 00 09 00 00 | ........| 00000020 00 32 00 00 0e b0 00 00 00 02 00 00 0e e2 00 00 |.2..............| 00000030 01 1e 57 58 42 4e 4d 53 57 44 00 00 00 00 00 00 |..WXBNMSWD......| 00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000050 00 00 00 00 41 54 54 52 00 00 00 01 00 00 0e e2 |....ATTR........| 00000060 00 00 01 34 00 00 00 b3 00 00 00 00 00 00 00 00 |...4............| 00000070 00 00 00 00 00 00 00 04 00 00 01 34 00 00 00 20 |...........4... | 00000080 00 00 15 63 6f 6d 2e 61 70 70 6c 65 2e 71 75 61 |...com.apple.qua| 00000090 72 61 6e 74 69 6e 65 00 00 00 01 54 00 00 00 10 |rantine....T....| 000000a0 00 00 1a 63 6f 6d 2e 61 70 70 6c 65 2e 6c 61 73 |...com.apple.las| 000000b0 74 75 73 65 64 64 61 74 65 23 50 53 00 00 00 00 |tuseddate#PS....| 000000c0 00 00 01 64 00 00 00 2a 00 00 24 63 6f 6d 2e 61 |...d...*..$com.a| 000000d0 70 70 6c 65 2e 6d 65 74 61 64 61 74 61 3a 5f 6b |pple.metadata:_k| 000000e0 4d 44 49 74 65 6d 55 73 65 72 54 61 67 73 00 00 |MDItemUserTags..| 000000f0 00 00 01 8e 00 00 00 59 00 00 37 63 6f 6d 2e 61 |.......Y..7com.a| 00000100 70 70 6c 65 2e 6d 65 74 61 64 61 74 61 3a 6b 4d |pple.metadata:kM| 00000110 44 4c 61 62 65 6c 5f 6f 66 66 32 74 33 34 64 33 |DLabel_off2t34d3| 00000120 75 74 70 35 6f 37 76 73 77 70 70 66 61 72 68 72 |utp5o7vswppfarhr| 00000130 79 00 00 00 30 30 38 32 3b 36 33 30 32 62 30 39 |y...0082;6302b09| 00000140 66 3b 4d 69 63 72 6f 73 6f 66 74 5c 78 32 30 57 |f;Microsoft\x20W| 00000150 6f 72 64 3b 9f b0 02 63 00 00 00 00 66 9a 4f 18 |ord;...c....f.O.| 00000160 00 00 00 00 62 70 6c 69 73 74 30 30 a0 08 00 00 |....bplist00....| 00000170 00 00 00 00 01 01 00 00 00 00 00 00 00 01 00 00 |................| 00000180 00 00 00 00 00 00 00 00 00 00 00 00 00 09 f2 8a |................| 00000190 e8 14 f6 73 bd 4a 7d d8 21 e3 ac e3 1c 5d ff 2b |...s.J}.!....].+| 000001a0 b6 18 5c 7e 64 9f bf 7a 8a 7b 7a 8f 03 ff 38 03 |..\~d..z.{z...8.| 000001b0 d6 b5 40 7f c9 68 33 2c 8f f6 35 80 70 77 42 5f |..@..h3,..5.pwB_| 000001c0 0d ae 68 66 f7 f1 fe 6e 0b c5 eb 43 7a 50 93 95 |..hf...n...CzP..| 000001d0 bb 40 65 df 61 ee 12 82 f5 77 79 1d a8 ed 86 a7 |.@e.a....wy.....| 000001e0 fa 4d 2c d2 2d 7a 4b 00 00 00 00 00 00 00 00 00 |.M,.-zK.........| 000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000ee0 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 |................| 00000ef0 00 1e 54 68 69 73 20 72 65 73 6f 75 72 63 65 20 |..This resource | 00000f00 66 6f 72 6b 20 69 6e 74 65 6e 74 69 6f 6e 61 6c |fork intentional| 00000f10 6c 79 20 6c 65 66 74 20 62 6c 61 6e 6b 20 20 20 |ly left blank | 00000f20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000fe0 00 00 00 00 01 00 00 00 01 00 00 00 00 00 00 00 |................| 00000ff0 00 1e 00 00 00 00 00 00 00 00 00 1c 00 1e ff ff |................| 00001000
Although this is a big blob of binary data, there are some interesting things you can notice:
- The type and creator codes are here (at offset 0x32): WXBN is the file type and MSWD is the creator.
- You can see the names of four extended attributes:
- com.apple.quarantine (name starting at offset 0x83)
- com.apple.lastuseddate#PS (name starting at offset 0xa3)
- com.apple.metadata:_kMDItemUserTags (name starting at offset 0xCB)
- com.apple.metadata:kMDLabel_off2t34d3utp5o7vswppfarhry (name starting at offset 0xFB)
As long as the file's corresponding ._ file always accompanies the original file when it is moved/copied, the metadata will move/copy with it. If, however, this file should get lost (e.g. the original file moved without it, or if it should be deleted), then the metadata will be lost. For example:
$ rm ._foo.docx $ GetFileInfo foo.docx file: "/Volumes/FAT/foo.docx" type: "\0\0\0\0" creator: "\0\0\0\0" attributes: avbstclinmedz created: 08/21/2022 18:24:31 modified: 08/21/2022 18:24:31
Notice how the type/creator information is no longer available. That's because it was stored in the ._ file (along with other metadata), so when that file gets deleted, so does its content.
Copying to non-Mac file systems that support forks
Interestingly enough, it doesn't always work this way. For example, if you would copy the file to an NTFS volume that is shared by a modern version of Windows (e.g. Windows 10), you will not find any ._ file stored alongside the original file. But you can copy/move it all over the Windows file system and when you copy it back to the Mac (or open it via the Windows file share), the metadata will present itself.
So what's going on here?
The answer is that the NTFS file system (used by most Windows installations these days) supports multiple forks, just like Apple's file systems do. Microsoft calls these forks Alternate Data Streams or ADS. When macOS copies a file to a network volume and the server reports that it supports ADS, macOS will store the file's metadata (Finder info, extended attributes and forks) as alternate data streams associated with the original file. These streams are typically kept hidden from users, but you can see them if you know where to look.
If you use the DIR command without any special options, you will not see them:
C:\Users\...\tmp> dir Volume in drive C has no label. Volume Serial Number is DA9A-72B5 Directory of C:\Users\...\tmp 08/22/2022 11:27 <DIR> . 08/22/2022 11:27 <DIR> .. 08/22/2022 11:26 11,878 foo.docx 1 File(s) 11,878 bytes 2 Dir(s) 56,669,786,112 bytes free
But if you use the /R option, then you will be able to see five alternate data streams, in addition to the original file:
C:\Users\...\tmp> dir /r Volume in drive C has no label. Volume Serial Number is DA9A-72B5 Directory of C:\Users\...\tmp 08/22/2022 11:27 <DIR> . 08/22/2022 11:27 <DIR> .. 08/22/2022 11:26 11,878 foo.docx 60 foo.docx:AFP_AfpInfo:$DATA 16 foo.docx:com.apple.lastuseddate#PS:$DATA 89 foo.docx:com.apple.metadatakMDLabel_off2t34d3utp5o7vswppfarhry:$DATA 42 foo.docx:com.apple.metadata_kMDItemUserTags:$DATA 32 foo.docx:com.apple.quarantine:$DATA 1 File(s) 11,878 bytes 2 Dir(s) 56,669,786,112 bytes free
Applications that are ADS-aware can open these alternate data streams and read them as if they were separate files. The name is what you see in the directory listing, but without the :$DATA suffix. One application bundled with Windows that is ADS aware (and can therefore read these streams) is Notepad:
C:\Users\...\tmp> notepad foo.docx:AFP_AfpInfo
And if you do this, you will see that the AFP_AfpInfo stream contains the Finder info, including the type and creator codes:
IMPORTANT: Do not save this file! These alternate data streams contain binary data and Notepad is a text editor. If you try to save the stream, you will probably corrupt its content.
The other named streams (obviously) correspond to the four extended attributes we saw in the hex-dump of the ._ file. The only difference between each stream's name and its attribute's name is that : characters have been replaced with a private-use Unicode character (U+F022: ), because colons are illegal characters in Windows file names.
No comments:
Post a Comment