Image Resize Cache Management
I am working on automated management of cache files in the _files/cache/*
dir, and I already have the plans ready for automated removal of expired folder and menu JSON files. However, managing image resize cache files is much more complicated. Below is a list of ideas with comments about why they are useful or not.
* I am posting here because I wanted to share the ideas, as I'm sure questions will be asked. I don't expect comments, but if anyone has feedback, please feel free to post!
What's the problem?
The problem is that the resize image cache dir _files/cache/images
keeps building up with cache files, without deleting expired cache files from dirs or images that are either renamed, moved, deleted or simply no longer exist within the root
dirs. This cache dir needs to occasionally be cleaned, removing invalid or expired cache files.
Solutions
๐ #1 Loop through config root
to check if hashed cache files are valid
If storing cache files like now $pathhash.$time.$size.$preview_dimensions.jpg
, we could then loop through all files in root
and check if they have a matching cache file. All files in the cache dir that we don't find an original match for, can be deleted. Although this would work for moderate root
dir structures, it could take ages and might timeout for massive root
dirs and/or if the file system is slow (HDD, network disk etc). This could be problematic, especially if we need to repeat the process for all user root dirs. It just doesn't seem safe or feasible.
๐ #2 _files/cache/images/relative/path/to/image.$preview_dimensions.jpg
Originally, I planned to store resized images in their root relative path, so we could easily identify if the images exist relative to the root
dir. However, now that we have multiple users with potentially multiple root dirs, relative paths may crash and could even cause wrong images to be served. Also, it would mean that image cache is created separately for each unique root
, even if each root shares many/most of the same folders, causing a waste of storage and CPU. Furthermore, if root
is changed (for default or any user), the entire cache for the old root
will suddenly expire and have to get recreated relative to the new root
. Wasteful. Finally, if we store images in relative/path/to/gallery/image.jpg
it uses additional inodes for each folder level, and many servers are limited by inode amount usage (not only storage amount). We could solve the challenge with multiple roots by using {$ROOTHASH}/relative/path/to/image.jpg
but it's still ineffective since cache needs to get created and stored uniquely for each root
and will expire if the root
changes, even if the images are also available in the new root, because the relative path chanages.
๐ #3 _files/cache/images/absolute/path/to/image.$preview_dimensions.jpg
This is a similar approach as above, except we store the images with the absolute server path. This solves a few problems, since all roots/users can now share the same image cache, regardless of where each root
points. Also, if root
changes, existing cache items may still remain valid. However, this method exposes the absolute server path (which some might not like), and paths may become very long/deep to fit the absolute server path eg. "_files/cache/images/absolute/server/path/to/root/then/path/to/folder/image.jpg". This complex dir structure will spend a large amount of inodes, and becomes increasingly complicated to manage. Finally, even if we now can easily clear image cache files that don't have matching absolute paths, we can't know if each cache file is still valid for any existing root
... In the end, it's just not useful.
๐ค #4 Store absolute path references as array in cache.php
We could use hashed file names like now $pathhash.jpg
and then store a reference to the absolute path inside _files/images/cache.php
as '/absolute/path/to/image.jpg' => '$pathhash.jpg'
. This would work similar as the option above without exposing the absolute path and without spending an excessive amount of inodes to create complex paths that match the original absolute path. However, it's still clumsy, because we need update cache.php
when creating new resized images, and we can't be sure that a valid cache file is actually valid for any root
.
๐ค #5 Delete cache files based on last file access time fileatime()
This is a solution I initially did not want to consider, because it's not entirely accurate or instant. However, considering the options, it turns out to be an acceptable solution that "gets the job done". The idea is to delete cache files that have not been accessed for a very long time (for example 3 months), in which case it's safe to assume they have "expired". Even if they remain valid, it would still be "ok" to delete cache files that have not been in use for such a long time. Although this method is imprecise and delayed, it will ultimately keep the image cache "fresh".
โ
#6 /actual/path/to/folder/_files/imagename.320.jpg
This is kinda the "ultimate" solution, bypassing issues in the other methods. By storing resized images directly in the users folders, the cache becomes accessible for any root
that has access to each dir, and will remain in tact and valid also after moving or renaming dirs. Also, the cache becomes "portable", useful for those who want to migrate their Files Gallery between servers (for example creating resized images on desktop, and publishing to slow NAS server). Removing expired cache image files can be done in the same process as when a dir is loaded by PHP (eg. when the dir changes). The big ๐NO-NO๐ with this solution, is that it pollutes the users own file system with Files-specific cache files, and therefore this method needs to be strictly *OPTIONAL.
Conclusion
โ
Combine #4 and #5
Could we combine options #4 and #5? Together, they would be dynamite ๐งจ Option #4 would make sure to always clear invalid "orphan" cache files, while option #5 would make sure to clean cache files that may not be valid for any root or just haven't been accessed for a very long time.
โ
#6 [Optional]
Option #6 may be the most effective solution, but must remain OPTIONAL (disabled by default). We don't want Files Gallery to automatically start "polluting" the users file system with _files
folders in each dir, unless the user specifically opts for this.