Skip to content

meedstrom/org-mem

Repository files navigation

Org-mem

A cache of metadata about the contents of all your Org files – headings, links, timestamps and so on.

Builds quickly, so that there is no need to persist data across sessions. My M-x org-mem-reset:

Org-mem saw 2418 files, 8388 headings, 8624 links (3418 IDs, 4874 ID-links) in 1.95s

This library came from asking myself “what could I move out of org-node, that’d make sense in core?” Maybe a proposal for upstream, or at least a PoC.

Many notetaking packages now reinvent the wheel, when it comes to keeping track of some or many files and what may be in them.

Example: org-roam’s DB, org-node’s hash tables, and other packages just re-run grep all the time, which still leads to writing elisp to cross-reference the results.

And they must do this, because Org ships no tool to query across lots of files. You know what happens if you put 2,000 files into org-agenda-files! It needs to open each file in real-time to check anything in them, so everyday commands grind to a halt, or even crash: many OSes have a cap of 1,024 simultaneous file handles.

Quick start

Example setup:

(setq org-mem-watch-dirs '("~/org/" "/mnt/stuff/notes/"))
(setq org-mem-do-sync-with-org-id t)
(org-mem-updater-mode)

That’s it – give it a couple of seconds, and now evalling (org-mem-all-entries), (org-mem-all-links) and variants should return a lot of results. See examples of how to use them at section Elisp API!

The above example checks all files in org-mem-watch-dirs recursively, as well as files mentioned in org-id-locations and org-id-extra-files.

Two APIs

You get two different APIs to pick from, to access the same data.

  • Emacs Lisp
  • SQL

Why two? It’s free. When the data has been gathered anyway, there is no reason to only insert it into a SQLite db, nor only put it in a hash table.

Famously, org-roam uses a SQLite DB. My package org-node used simple hash tables. Now you get both, without having to install either.

Data only

A design choice: Org-mem only delivers data. It could easily ship conveniences like, let’s call it a function “org-mem-goto”:

(defun org-mem-goto (entry)
  (cl-assert (org-mem-entry-p entry))
  (find-file (org-mem-entry-file entry))
  (goto-char (org-mem-entry-pos entry))

but in my experience, that will spiral into dozens of lines over time, to handle a variety of edge cases. Since you may prefer to handle edge cases different than I do, or have different needs, it ceases to be universally applicable.

So, it is up to you to write your own “goto” function, and all else to do with user interaction.

No Org at init

A design choice: Org-mem does not use Org code to analyze your files, but a custom, more dumb parser. That’s for three reasons:

  1. Fast init. Since I want Emacs init to be fast, it’s not allowed to load Org. Yet, I want to be able to use a command like org-node-find to browse my Org stuff immediately after init.

    Or be warned about a deadline even if I don’t.

    That means the data must exist before Org has loaded.

  2. Robustness. Many users heavily customize Org, so no surprise that it sometimes breaks. In my experience, it’s very nice then to have an alternative way to browse, that does not depend on a functional Org setup.
  3. Fast rebuild. As they say, there are two hard things in computer science: cache invalidation and naming things.

    Org-mem must update its cache as the user saves, renames and deletes files. Not difficult, until you realize that files and directory listings may change due to a Git operation, OS file operations, a mv or cp -a command on the terminal, edits by another Emacs instance, or remote edits by Logseq.

    A robust approach to cache invalidation is to avoid trying: ensure that a full rebuild is fast enough that you can just do that instead.

    In fact, org-mem-updater-mode does a bit of both, because it is still important that saving a file does not lag; it does its best to update only the necessary tables on save, and an idle timer triggers a full reset every now and then.

A SQLite database, for free

Included is a drop-in for org-roam’s (org-roam-db), called (org-mem-roamy-db).

In the future we may also create something that fits org-sql’s DB schemata, or something custom, but we’ll see!

Without org-roam installed

Activating the mode creates an in-memory database by default.

(org-mem-roamy-db-mode)

Test that it works:

(emacsql (org-mem-roamy-db) [:select * :from files :limit 10])

With org-roam installed

You can use this to end your dependence on org-roam-db-sync. Set the following to overwrite the “org-roam.db” file.

(setq org-roam-db-update-on-save nil)
(setq org-mem-roamy-do-overwrite-real-db t)
(org-mem-roamy-db-mode)

Now, you have a new, all-fake org-roam.db! Test that org-roam-db-query works:

(org-roam-db-query [:select * :from files :limit 10])

N/B: because (equal (org-roam-db) (org-mem-roamy-db)), the above is equivalent to these expressions:

(emacsql (org-roam-db) [:select * :from files :limit 10])
(emacsql (org-mem-roamy-db) [:select * :from files :limit 10])

A known issue when when you use multiple Emacsen: “attempt to write a readonly database”. Get unstuck with M-: (org-roam-db--close-all).

View what info is in the DB

Use the command M-x org-mem-list-db-contents.

Elisp API

Example: Let org-agenda cast its net wide

You can’t put 2,000 files in org-agenda-files, but most contain nothing of interest for the agenda anyway, right?

Turns out I have only about 30 files worth checking, the challenge was always knowing which files ahead-of-time. Now it’s easy:

(defun my-set-agenda-files (&rest _)
  (setq org-agenda-files
        (cl-loop
         for file in (org-mem-all-files)
         unless (string-search "archive" file)
         when (seq-find (lambda (entry)
                          (or (org-mem-entry-active-timestamps entry)
                              (org-mem-entry-todo-state entry)
                              (org-mem-entry-scheduled entry)
                              (org-mem-entry-deadline entry)))
                        (org-mem-entries-in file))
         collect file)))
(add-hook 'org-mem-post-full-scan-functions #'my-set-agenda-files)

(org-mem-updater-mode)

Example: Warn about dangling clocks at init

While Org can warn about dangling clocks through the org-clock-persist setting, that requires loading Org at some point during your session. Which means that if it is a matter of concern for you to forget you had a clock going, that you effectively have to put (require 'org) in your initfiles, just in case.

Now the following is an alternative:

(defun my-warn-dangling-clock (&rest _)
  (let ((not-clocked-out (org-mem-all-entries-with-dangling-clock)))
    (when not-clocked-out
      (warn "Didn't clock out in files: %S"
            (delete-dups (mapcar #'org-mem-entry-file not-clocked-out))))))
(add-hook 'org-mem-initial-scan-hook #'my-warn-dangling-clock)

(org-mem-updater-mode)

Entries and links

We use two types of objects to help represent file contents: org-mem-entry objects and org-mem-link objects. They involve some simplifications:

  • An org-mem-link object corresponds either to a proper Org link, or to a citation fragment.
    • Check which it is with org-mem-link-citation-p.
  • The content before the first heading also counts as an “entry”, with heading level zero.
    • Some predictable differences from normal entries: the zeroth-level entry obviously cannot have a TODO state, so org-mem-entry-todo-state always returns nil, and so on.
    • Check with org-mem-entry-subtree-p.
      • Or if you’re looking at the output of (org-mem-entries-in-file FILE), the first element (the car) is always a zeroth-level entry. The rest (the cdr) are subtrees.
    • If the zeroth-level entry is absolutely empty, such that the first proper Org heading is on line 1, then (org-mem-entry-at-lnum-in-file 1 FILE) returns the entry for that heading instead of the zeroth-level entry. That is hopefully intuitive. Opinions on API design are very welcome!

Full list of functions [2025-05-30 Fri 11:26]

  • org-mem-all-entries-with-active-timestamps
  • org-mem-all-entries-with-dangling-clock
  • org-mem-all-entries
  • org-mem-all-files
  • org-mem-all-id-links
  • org-mem-all-id-nodes
  • org-mem-all-ids
  • org-mem-all-links
  • org-mem-all-roam-reflinks
  • org-mem-all-roam-refs
  • org-mem-entries-in-file
  • org-mem-entries-in-files
  • org-mem-entries-in
  • org-mem-entry-at-file-lnum
  • org-mem-entry-at-file-pos
  • org-mem-entry-at-lnum-in-file
  • org-mem-entry-at-pos-in-file
  • org-mem-entry-by-id
  • org-mem-entry-by-roam-ref
  • org-mem-entry-closed
  • org-mem-entry-crumbs
  • org-mem-entry-deadline
  • org-mem-entry-file
  • org-mem-entry-id
  • org-mem-entry-level
  • org-mem-entry-lnum
  • org-mem-entry-olpath-with-file-title-with-self
  • org-mem-entry-olpath-with-file-title
  • org-mem-entry-olpath-with-self-with-file-title
  • org-mem-entry-olpath-with-self
  • org-mem-entry-olpath
  • org-mem-entry-pos
  • org-mem-entry-priority
  • org-mem-entry-properties
  • org-mem-entry-property
  • org-mem-entry-roam-refs
  • org-mem-entry-scheduled
  • org-mem-entry-subtree-p
  • org-mem-entry-tags-inherited
  • org-mem-entry-tags-local
  • org-mem-entry-tags
  • org-mem-entry-that-contains-link
  • org-mem-entry-title-maybe
  • org-mem-entry-title
  • org-mem-entry-todo-state
  • org-mem-file-attributes
  • org-mem-file-by-id
  • org-mem-file-entries
  • org-mem-file-id-strict
  • org-mem-file-id-topmost
  • org-mem-file-line-count
  • org-mem-file-mtime-floor
  • org-mem-file-mtime
  • org-mem-file-ptmax
  • org-mem-file-size
  • org-mem-file-title-or-basename
  • org-mem-file-title-strict
  • org-mem-file-title-topmost
  • org-mem-file-truename
  • org-mem-id-by-title
  • org-mem-id-links-from-id
  • org-mem-id-links-to-entry
  • org-mem-id-links-to-id
  • org-mem-id-node-by-title
  • org-mem-id-nodes-in-files
  • org-mem-link-citation-p
  • org-mem-link-description
  • org-mem-link-file
  • org-mem-link-nearby-id
  • org-mem-link-pos
  • org-mem-link-target
  • org-mem-link-type
  • org-mem-links-from-id
  • org-mem-links-in-entry
  • org-mem-links-in-file
  • org-mem-links-of-type
  • org-mem-links-to-target
  • org-mem-links-with-type-and-path
  • org-mem-next-entry
  • org-mem-previous-entry
  • org-mem-roam-reflinks-into-file
  • org-mem-roam-reflinks-to-entry
  • org-mem-roam-reflinks-to-id

Current limitations / future work

Limitation: TRAMP

Files over TRAMP are excluded from org-mem’s database, so as far as org-mem is concerned, it is as if they do not exist.

(However, org-mem is also careful not to scrub them from your org-id-locations, so your ID-links should still work.)

This limitation comes from the fact that org-mem parses your files in many parallel subprocesses that do not inherit your TRAMP setup. It is fixable in theory.

Limitation: Encrypted and compressed files (.org.gpg, .org.gz)

When TRAMP support is fixed, we should be able to fix this too.

Limitation: Encrypted entries

Specific entries in a file may be encrypted by org-crypt. Org-mem cannot find links or active timestamps inside these.

About

Turn thousands of Org files into a database in seconds

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •