Get Modified Files Within Two Snapshot


project python

To make my blogging habit a little bit easier, I create a tool to easily convert my jupyter notebooks to markdown that could be parsed into Hugo, called pynote2mds3 that use nbconvert to convert and upload any image into S3. However, I soon find out that I need to manually run this command for each of my new posts and I kind of looking for a way to automate that. The solution is using make.

make is a great and flexible tool. I’ve been using Linux for quite some time, and have been interacting with makefile and cmake before but only as a user. I never actually write a Makefile until recently and was blown away by its usefulness and its simplicity. For this blog, I simply create an entry in my makefile to convert all modified/new notebooks to markdown using my tool.

But the problem is I need a way to filter files that are only modified, or new in my notebooks directory. We do not want to reprocess all unchanged files. I tried looking for another simple solution, like using Linux command find to find all files that are being modified in the last n minutes, but that is still not enough for my need. Then as a challenge, I decided to write a simple script to accomplish that.

The idea is to use watchdog dirsnapshot function to create a snapshot of a folder, and compare it to the previous snapshot.
Using watchdog, we could create a snapshot of a directory with command:

snapshot=DirectorySnapshot(path)

and if you have two snapshot, you can get the difference (i.e files that are modified in one snapshot compare to the others) by using DirectorySnapshotDiff function:

diff = DirectorySnapshotDiff(old_snapshot, new_snapshot)
final_data = diff.files_created + diff.files_modified

With this final list of files, I then could pipe it into my converter tools to create the markdown. Now I don’t have to type one by one the filename of my notes, I just call make note to convert all my modified/new notes into markdown. Simple right?
You can see my code for this diff in: my gist