聚焦Spotlight

对我来说,为什么HFS +(a.k.a。)一直是个谜Mac OS Extended) file system is so often disrespected.

The first wave of disrespect was when Mac OS X 10.0 shipped, with support for both UFS and HFS+ disk formatsDespite the fact that HFS+ was both the default andApple强烈推荐, numerous Unix nerds took it upon themselves to format their startup volumes using UFSThe guys who did this were the sort of people who wanted to believe that Mac OS X was more like “Unix with a Mac-like appearance” than what it really is — the Mac OS with Unix-like underpinnings.

使用UFS引用的原因通常包括以下宝石:

  • 我更喜欢区分大小写的文件系统。
  • 我讨厌资源分叉,无论如何它们都会消失。
  • 我讨厌文件类型和创建者元数据And they’re going away, anyway.
  • 特定于HFS的功能与命令行工具不兼容。

(The second and third reasons are often conflated by misinformed individuals who believe that type/creator metadata is storedthe resource fork; it’s notJust like any other metadata — such as the filename and the creation/modification dates — type/creator metadata is stored in the file system, not in the file itself.)

它的要点归结为I’m a Unix tough guy; I have no need for that deprecated Mac OS baby stuff.For typical desktop use, there were never any actual technical reasons for using UFS in lieu of HFS+; it just满面喜欢讨厌的事情。

Of course, every person I’ve encountered who tried this eventually repented and reformatted their drive as HFS+The truth is that HFS+ is — unsurprisingly — much better-suited to Mac OS X than UFS.

The next wave of HFS+ disrespect started two years ago when Apple hiredDominic Giampaolo, renowned file system design expert and creator of the highly-regarded, metadata-rich Be File System.啊,哈,花生画廊惊呼,Apple hired Dominic Giampaolo to write a brand-new file system to replace HFS+!

Flash-forward to last week’s WWDC announcements, where Apple announced thatMac OS X 10.4 (Tiger) will include an updated suite of command-line tools(例如。CP柏油) that fully support HFS+ resource forks and metadata.

The only negative thing that can be said about this is that it’s about fucking timeUpdating the tools to support HFS features is a much better solution than trying to ram the less-featured UFS down everyone’s throat as a sop to the limitations of 30-year-old command-line tools.

简而言之,10.4将提供更多support for HFS+ features, not less.

And so what then has Giampaolo been working on? One answer, we now know, is that he’s been adding metadata features在上面of HFS+. Specifically, Spotlight — which is, in the words of one WWDC attendee, Giampaolo’s “baby”.

“Spotlight” is much more than just the visible UI shown during Jobs’s keynote: the vibrant blue search field in the top-right corner of the screen, and accompanying search results window那是Spotlight,用户可见的主题演示前端

然而,引擎盖下,Spotlight也是一组API可由第三方开发人员访问It’s an entirely new metadata database — not replacing the existing HFS+ file system, but instead built on top of it.

Via email, the aforementioned source who attended the Spotlight session at WWDC sent me the following report.

Spotlight is completely, relentlessly focused on files and files’ metadataFiles are the only object returned to Spotlight queriesTwo aspects of Jobs’ keynote were thus misleading:

  • The “spotlight” effect on System Preferences was wholly unrelated to Spotlight.

  • Spotlight’s ability to show results from Apple Mail archives on Jobs’ machine was tantamount to a shamBelieve it or not, Tiger Mail has switched to an “exploded”邮件目录-like storage format with a single message per file.

One implication of Spotlight’s file-centricity is that its ability to search “email” might not apply to clients other than Apple Mail — it’s the fact that the new Tiger version of Mail stores each message as a separate file that allows Spotlight to effectively return individual mail messages as search resultsNo other major mail client uses a one-message-per-file storage format.

[更新:The current version of Apple Mail — included with Mac OS X 10.3.x — already uses a one-message-per-file mail storage format for IMAP accounts (including .Mac accounts)For POP accounts, however, it uses “mbox” filesBeginning with 10.4, it will apparently use a one-message-per-file storage format for all mailboxes也,GyazMailalready uses a one-message-per-file mailbox storage format.]

Spotlight的全文搜索外包给SearchKit, which will be considerably faster in Tiger (“3× indexing, 20× incremental search” over Panther)So, Spotlight has three places to look for information about files: its own hand-tuned substring-matching metadata store (built by Giampaolo, not part of Core Data or anything else), Carbon’s HFS+ catalog calls (so Spotlightrespond to searches for type and creator), and SearchKit’s full-text index.

Both metadata collection and full-text indexing depend on cooperating per-file-format Importers, either written by Apple or by third partiesLike Google, no matter how much text an Importer provides, Spotlight only cares about the first 100K of raw text.

Importers are fired on every file the moment it is created, saved, changed, or moved, including when files are made available through a newly mounted drivePerformance is said to be excellent in every case except network-mounted home directories, which are bedeviling on several levels and on which they’re still working.

It’s through the default set of Importers that Spotlight is able to index and search format-specific metadata, such as the ID3 tags in MP3 files.

What’s cool about this architecture is that Spotlight’s indexes will thus stay up-to-date automaticallyAll you need to do is save, move, or copy a file, and Spotlight’s metadata and content indexes will note the changes on-the-flyCompare and contrast to the full-content file searching previously provided via Sherlock, which required periodic monolithic re-indexing of the content of your drives.

At the API level, Spotlight responds to a range of C-like, Google-like query modifiers:==-<><=> =, and both leading/trailing*Queries can toggle case-insensitivity, and also diacritical insensitivityHigh-level Cocoa APIs and comfortably low-level Carbon/CoreServices APIs are available in addition to the Finder UI.

聚光灯将会带来很多屁股。

以前: 状态
下一个: 会员密钥