Synopsis

MediaKeg is a lightweight solution for importing digital photos, video, and audio. It is powerful and flexible enough to support the needs of professional content creators wanting to improve workflow, and easy enough to use by everyday users wishing to organize or archive their family photos.

This document provides an in-depth view of MediaKeg's features and capabilities, which can help in evaluating if MediaKeg is right for you. For advanced users, this document also discusses concepts and provides details on how to customize MediaKeg. For new users wanting to get up and running with MediaKeg for the first time, refer to the Quickstart Guide instead.

Overview

MediaKeg is a lightweight solution for importing multimedia content, such as digital photos, video, and audio. The term import is often used by applications that process and manage multimedia content. As such, the term's strict meaning is application-specific. However, in virtually all cases, import involves some form of copying files from a source device (such as an SD card, camera, or file share) to an application managed folder or enclave. The import operation also often involves updating proprietary databases and catalog files, and can even include non-ingestion work such as generating preview images needed by the application.

With MediaKeg, the scope of an import operation starts and ends with ingesting files into a MediaKeg library, which is an ordinary file system folder containing only a small configuration file at its root to distinguish it from a non-library folder. This narrow, single-function scope makes MediaKeg highly efficient at importing large volumes of content and also helps improve the day-to-day workflow of photographers and videographers who regularly ingest new content onto their workstations.

By default, MediaKeg organizes and names imported media assets according to the date and time they were captured or created. The date and time information is read from the metadata embedded in each file. Consider the following file listing:

/Volumes/Sample/Files
├── Capture0001-Edit.jpg
├── Capture0001.jpg
├── Capture0001.nef
└── Capture0001.xmp

The file list is flat (lacks an organizational hierarchy), and the filenames are weak, meaning there is no way to tell if a different set of files by the same names reflect the same scenes or not. For that matter, there's no way to be sure (based on the filenames) if the files belong to the same scene (although in this case, there's a good chance they do because that they appear side-by-side in the same folder and have similar filenames).

Note: For now, think of a scene as what the camera saw at the time the digital image was captured. The topic of scenes is covered in detail later in this document.

Now, consider the following file listing, which shows how the same files might be organized and named upon being imported into a library having the default configuration:

/Volumes/Photos/Libraries/Personal
├── 2028
    └── 02
        ├── 20180221T211619-S570000-5ZMGA-00.JPG
        ├── 20180221T211619-S570000-5ZMGA-00.NEF
        ├── 20180221T211619-S570000-5ZMGA-00.xmp
        ├── 20180221T211619-S570000-5ZMGA-01.JPG

As can be seen, the files are organized by year and month, and their names contain embedded information that seems to follow a convention. The year and month information corresponds to the date the images were originally captured. The embedded information does, in fact, follow the MediaKeg filename convention, which this document covers in detail.

MediaKeg also provides the ability to customize how imported assets are named and organized using library templates. Library templates contain tokens that are replaced with like-named metadata tag values during an import operation.

MediaKeg supports any multimedia asset having an audio, image, or video MIME type, provided the asset contains the required metadata information. MediaKeg also has several advanced features and capabilities that provide a high degree of control, flexibility, and correctness over how assets are imported. The following list highlights these features and capabilities:

Stateless operation
Cross-platform support (Linux, macOS, Windows)
Command-line Interface (CLI)
Flexible templates
Open and compatible
Deterministic
Scene aware
Duplicate prevention
Error detection
Write metadata option
Timeshift option
Rollback option
Detailed logging and reports
Parallel processing
Utilities

This remainder of this section covers each feature and capability in detail.

Stateless Operation

MediaKeg functions solely at the file system level and requires no databases or catalogs for maintaining state over successive import operations. The stateless design was a key goal in the development of MediaKeg because it bring along several inherent benefits, as follows:

Library portability
Minimal maintenance tasks
Interoperability with other applications
Simplifies installation and removal
Facilitates standalone and cross-platform operation

The tradeoffs are:

Less flexibility over how imported library assets are named
Fewer opportunities to optimize performance (as compared to a stateful design)

To enable high performance with a stateless design, MediaKeg enforces rules over how imported media assets are named. These rules are covered in the Library Layout and Filenames section, and they result in a pattern that MediaKeg leverages for optimizing import performance.

Note: The file naming rules do not preclude the user from customizing how imported assets are named.

Cross-platform Support

MediaKeg is supported on the folowing OS platforms as a desktop application:

Linux
macOS
Windows

The MediaKeg runtime library is currently available with a Command-line Interface (CLI), with plans to make a graphical user interface (GUI) available in the future.

Command-line Interface

The Command-line Interface (CLI) provides full access to MediaKeg runtime library and is an efficient means of executing commands. The following example illustrates the efficiency with which an import operation can be executed using the CLI:

$ mkeg i /Volumes/DCIM

Note: This example assumes the user has already setup a default library and that the assets to import are located in the folder shown (and not a subfolder).

The MediaKeg CLI makes is quick and easy to ingest assets to a workstation or server without needing to launch a heayweight application that combines ingestion with other import tasks.

See the MediaKeg CLI Reference Guide for more information.

Flexible Templates

MediaKeg provides control over the layout (organization) and filenames of imported media assets using layout and filename templates, respectively. These templates contain tokens which MediaKeg exchanges for Exif metadata (or user supplied values) at import generate asset library paths and filenames.

For added flexibility over how imported assets are organized and named, template tokens can be static or dynamic. Static tokens always receive a replacement value even if their respective backing properties are unavailable, in which case default values are used in place of actual metadata. Conversely, a dynamic token is ignored if its backing property is unavailable.

MediaKeg templates also support user tokens, which recieve their backing values in the form of user input to an import operation. This capability provides an additional degree of freedom for how assets are organized and named, and can be used as an alternative to multiple libraries when wanting to separate assets according to user-defined criteria.

Open and Compatible

There is an abundance of applications available for processing and managing multimedia content, each having its own import solution. It's also common for artists, especially photographers, to work with multiple similar applications where each has a unique trait or capability that makes it optimal for a particular task. MediaKeg is not a replacement for these applications. Instead, MediaKeg complements such applications by providing a consistent cross-platform and cross-application solution for how assets are ingested and structured onto a workstation (or server).

Because MediaKeg libraries are ordinary file system folders, there are no compatibility or interoperability issues when accessed from other applications. Moreover, it's expected that such applications will create or add foreign files and assets to a MediaKeg library. The term foreign is used when referring to files copied to or created in a MediaKeg library by some external means. MediaKeg does not overwrite library content and incorporates sub-indexing into its file naming scheme to avoid filename collisions. Therefore, foreign files are not at risk of being overwritten.

MediaKeg does not eliminate the need for additional steps needed by a third-party application to perform as part of its import process. Hence, the user may need to reimport the files he or she wants to work with its application workspace. A benefit of inserting MediaKeg into the workflow is that the user can ingest the files onto their workstation without needing to commit to a particular application. Once ingested, the application can pick up where MediaKeg left off instead of needing to also copy the files.

Deterministic

MediaKeg prioritizes correctness over other factors when importing assets, which can lead to some assets being quarantined or an import operation being aborted. The quarantine folder is a special library folder that separates indeterminate assets from the rest of the library, so they don't contaminate the main population with misinformation. This approach also prevents such assets from being left behind at the source. Once corrected or mitigated, attempts to reimport the quarantined assets can be made.

Some of the approaches that MediaKeg uses to ensure correctness are as follows:

An asset must contain metadata that indicates its date and time of capture or creation. If not, the asset is considered indeterminate and is quarantined. Using the file last modified date as a fallback could make for a more friction-free import experience; however, this is not a reliable source of such information and is therefore not used.
An asset must contain metadata indicating its MIME type and must be of type application, audio, image, or video. If not, the asset is considered indeterminate and is quarantined, even if the asset file extension matches a known media type.
All assets are checksum verified using an MD5 hash algorithm (default) to ensure no bit errors occurred in the process of copying a file from source to library destination. The digest value is also used for duplicate detection. This is an example of how MediaKeg prioritizes correctness over added performance.
When using the option to write or update metadata values to import assets, MediaKeg stages the write operations and verifies they have been written correctly before moving them to their final library destination. This comes at a significant performance penalty, but and is another example of how MediaKeg prioritizes correctness over added performance.

MediaKeg provides options to relax some import requirements at the user's descretion, but is strict by default.

See the MediaKeg CLI Reference Guide for more information.

Scene Aware

Imagine that you have two photos of the same scene — an original and an edited version of the original — and the edit has metadata removed for privacy and sharing purposes. Since MediaKeg relies on metadata for determining how imported assets are organized and named, the edit could become separated from the original if the missing metadata is referenced by the library templates. MediaKeg addresses this problem by making the import process scene aware.

By making import scene aware, MediaKeg can help ensure that assets originating from the same capture or creation (a scene) are organized and named similarly after being imported. This feature is particularly useful when deep scanning multiple drives for assets to organize into a single library, where it is not unusual for various edits (variants) of an original to exist. For photographers shooting RAW+JPEG, variants exist straight out of the camera, before the editing and sharing process even begins. In advanced scenarios, where the library template includes a custom metadata tag specific to a RAW file, the scene aware feature ensures the JPEG is organized and named in a manner consistent with the RAW file.

See the Adanced Concepts for more information about scenes.

Duplicate Prevention

Duplicate prevention helps prevent two or more identical files from being imported into the same library. The algorithm used to detect duplicates is highly efficient and remains performant even as a library grows very large under normal use-case scenarios.

One of the scenarios where duplicate pevention plays an importat role is in finding and reorganizing all of one's photos into a single library. Image that you've accumulated years of photos spread out over mutiple drives and it's time to get organized, or that you want to be sure not to lose any photos before sending the drives off to be recycled, or both. Wthout the proper tools, this is a painstaking process to complete without losing photos or amassing lots of duplicates by copying folders instead of files, especially if you're the type of person who likes to create multiple backups.

The scenairo just described it one for which MediaKeg was specifically developerd to handle in a through yet performant manner. Simply point MediaKeg at the root of each drive to import photos and it takes care of scanning for photos (and other mutimedia assets if desired) and importing them into a single library, leaving behind any duplicates. Any copies of an orginal file that have been subsequently edited do not count as duplicates. Rather, only files having identical file digests (a form of file signature or fingerprint) are considered duplicates. By default, MediaKeg uses the MD5 algorithm for calculating file digests.

Note: MediaKeg will not overwrite a previously imported asset with another asset resolving to the same library path and filename. Naming conflicts could occur if two or more edited versions of an asset having the same file extension are imported because they are not duplicates but have the same metadata. MediaKeg incorprates a file subindex into its file naming scheme to prevent such collisions.

Duplicate prevention also plays a role in the workflow of ingesting multimedia content from removable storage (SD card, CFast card, etc.) to a workstaton or server. Unless the removable storage device is reformatted prior to reuse, the possibiity of accumulating duplicates or overwriting a previously imported and edited file exists withtout some form of duplicate detection and prevention strategy.

Error Detection

Data corruption is rare in modern compting devices but such errors can occur if the storage device is starting to fail or when uncorrected memory errros occur. The probability of hitting such an error increases when copying large volumes of data, such importing to and backing up multimedia libraries. Most consumer PCs and laptops lack the error correcting (ECC) memory to protect against the latter, which is typically reserved for high-end professional workstations due to the added cost of ECC memory.

To help guard against data corruption of imported media assets, MediaKeg recalulates the MD5 digest for each imported asset after the file copy operation is complete and compares it to the source digest. If the digests to not match, the MediaKeg automatically re-attempts to copy the file (up to three times). Multiple failed attempts (where the copy operation was allowed but the digests to not match) is most likely a hardware issue with the source or destination device. MediaKeg warns and maintains a log of such events for diagnostic purposes.

Note: MediaKeg ensures that the imported file matches the source can does not prevent a corrupted source file from being imported, unless the source file is so badly corrupted that a metadata read error occurs. If the latter, then the file asset is quarantined in a special folder so it is not left behind while also not being added to the general population.

Write Metadata Option

MediaKeg provides the option write new or updated metadata to media assets as they are being imported. The write values are applied to all assets processed for the import session, so this should be kept in mind when considering this option. For example, you probably wouldn't want to update the camera make and model information for a lifetime's worth of digital photos in a single import session. On the other hand, adding artist, copyright, or keyword information could make sense depending on how the source files are organized.

MediaKeg inserts an intermediary staging setp into the import pipeline when the write option is invoked to accomplish the following:

To perform duplicate detection on the modified binary instead of the orginal source.
To ensure all assets accept the update values prior to importing them into the target library.

Note: The staging folder is located at the root of the target library by default and is automatically removed at the end of an import operation. The import command provides the option to specify an alternate path.

Because updating metadata requires assets to be modified, duplicate detetction operates on the modified binary instead of the orginal source. This guards against the same asset slipping past the duplicate detection logic if reimported using the same write values. If the orginal asset is later re-imported with diffferent write values (or as-is), then it is not a duplicate of its previously updated self and will be imported.

By default, MediaKeg aborts an import operation if any asset rejects a write operation (or if the write value did not take for some reason). The user can override the defaut behavior to be best effort, where the import operation is allowed to continue even if a write operation did not take on one or more assets. This might be necessary if the import collection spans multiple file formats because supported metadata tags and if writable varies by file format.

When expanding templates to determine asset library paths an filenames, the write value is use in place of the original value when referenced by a template token. Overriding the default import behavior to best effort is therefore ill-advised in such cases. Doing so could lead to inconsistencies in how some assets are organized and named if reimported into another library, unless the user can somehow manage to retrace the import operations of the exsting library when reimporting.

Note: When using the write option to update asset metadata, a copy of the origial is kept in a folder at the root of the target library. The import command provides the option to specify an alternate path. This feature is not intended to be a substitue for a proper backup solution and applies only to assets imported using the write option.

Note: When using the write option to update asset metadata, error detection verifies assets are copied correctly from source to stage and then from stage to final library destination. However, error detection does not check for any data corruption errors that may have occured during the write operation itself. Therefore, it's important to always keep a backup of the orginal asset.

Note: MediaKeg uses ExifTool for all metadata read and write operations. ExifTool is a widely-used and trusted application within the photographic community, and its open source library is trusted and used by several other third-party applications.

See also: MediaKeg CLI Reference

--keywords, --timeshift , --timezone options
/w option

Timeshift Option

Imagine that you've just purchased a new camera or traveled to a different region of world without setting or resetting the camera date and time. When the time comes to review or import your photos, it becomes appearant that the photos you took yesterday are recorded as having been taken several days, weeks, or even years ago, or the family photo you took on a sunny beach elswhere in the world is recorded as being captured at midnight. This is probably not the desired result if you care about such things and is a problem the timeshift option addresses.

The timeshift option is a special write metadata option with extra safeguards to protect again unintended consquences. Specifically, it updates date and time metadata indicating when an asset was captured or created by a time duration offset. MediaKeg does not allow directly setting date and time values because import typically involves processing several assets, and it's atypical that all assets in an import collection would be captured or created at the same instant in time.

As an additional safeguard, the timeshift option also requires that all assets from an import collection originate from the same device. The reasoning is that it's unlikely images captured from two or more different devices would need to be offset by the same duration value.

See also: MediaKeg CLI Reference

--timeshift option

Rollback Option

MediaKeg provides the ability to rollback (undo) import operations. Rollback removes all assets previously imported into a lbirary for a specified import operation. The typical use-case is following the last import where the user realizes he or she selected the wrong source or targeted the wrong library. However, rollback can be applied to any import operation previously used to import assets into a library.

Note: The rollback option depends on log files that MediaKeg creates for each import operation, which are saved under the target library. This is the one exception where MediaKeg requires saved state to carry out an operation. Should the logs directory be deleted, import will continue to function normally but the user will lose the ability to rollback all previous import operations.

See also: MediaKeg CLI Reference

rollback command

Detailed Logging and Reports

MediaKeg writes detailed logs for all import operations. With the exception of rolling back previous import operations, the logs are not required for MediaKeg to function. In addition to enabling rollback, logs can assist with the following scenarios:

Tracing a library assets back to their orginal sources and filenames.
Reviewing profile data for optimzing import performance.
Reviewing warning and error details associated with an import operation.
Running scripts for auditing changes to library assets since imported.

In general, log data consumes a very small percentage of storage relative to imported media assets. If desired, logs can be safely deleted at anytime to recover storage; however, this should only done once certain there is no need for log data in the future or the ability to rollback a prior import operation.

Parallel Processing

MediaKeg parallelizes import workloads across multiple processes to help reduce the total amount of time needed to complete a job. By default, the total number of processes scales with the total number of processor cores available on the host computer. For most systems, maximum throughput is gated by storage performance so there are limits to how much multiprocessing can help.

MediaKeg logs simple profiler information that can assist in finding bottlenecks and fine-tuning performance. The total number of processes assigned to an import operation can be set globally or tuned for file read (metadata), digest, and copy operations.

See also: Performance Tuning

Utilities

MediaKeg provides serveral helpful utility commands in addition to the import command, such as:

Library management
Exif metadata viewer
Fast file find with regular expression matching
Timeshift calculator

See the MediaKeg CLI Reference Guide for the complete list of CLI commands and usage details.

Library Structure

A library consists of files and directories categorized as follows:

Library configuration
Resident assets
Auxiliary files
Foreign assets and files

Library configuration

A library has a hidden configuration file named .keg at its root, as shown here:

library
├── .keg

This file contains all of the information necessary to complete an import operation. This approach makes libraries portable, meaning it is possible to rename or move them without incurring additional maintenance tasks to resume import operations.

Resident assets

Resident assets are assets that were successfully imported into one or more library collections. A collection is a subdirectory of the library whose relative path is expanded using a layout template. The default layout template organizes assets according to the year and month captured or created. If July 2018, for example, then they appear under the following subdirectory:

library
├── 2018
│   └── 07
│       └── (assets) # "2018/07" collection

The collection name assumes the same name as its path, which is "2018/07" in the preceding example.

Note: An asset is not considered imported until it finds its way into a library collection using the process described above. Assets may be copied elsewhere in the library structure for reasons explained below, but such assets are not considered successfully imported.

Auxiliary files

Auxiliary files are the artifacts of an import operation and do not play an active role in subsequent import operations. Auxilary files fall into one of the following groups:

Log files
Stage files
Originals (when write metadata and / or sidecar option is invoked)
Quarantined files

A folder for each group can be found at the library root, as illustrated here:

library
├── .logs
├── .stage
├── _originals
├── _quarantine

Important: Before deciding to delete auxiliary files, be sure to consider the information provided below to understand the tradeoffs.

Note: The .logs folder is the only folder that is always present after an import operation. The presence of the other folders shown is conditional, as discussed below.

Note: The import command provides the option to specify an alternate path for staged assets and originals.

Log files

The .logs directory is a hidden folder that contains detailed logs for each import operation. The logs are organized according to the date and time (UTC) an import operation starts. For example, the log files for an import operation that started on 2020-03-18T003435-07:00 (local time) are found under the following folder:

library
├── .logs
│   └── 2020
│       └── 03
│           └── 18
│               └── 073435 # UTC time

The log files contain the following types of information:

Import summary
Sources of imported media content and change log (auditing trail)
Import errors and warnings
Troubleshooting and performance tuning information
Rollback journals

The amount of storage the log files take is a very small relative to imported multimedia assets. Therefore, it's recommended to keep the log files if possible in case needed at a later date. If the log files are deleted then the ability to rollback for an import operation is lost, which is an exception to stateless operation claim at the start of this document.

Note: If the log files are deleted then the ability to rollback for an import operation is lost.

Stage files

The .stage directory is a temporary folder for staging assets when invoking the option to write metadata properties during import. The stage directory serves two functions:

Ensure assets are updated correctly prior to importing them into the library.
Enable duplicate to be peformed on the modified binary instead of the orginal source.

Once the stage directory has served its purpose, the assets contained within are copied to their final destination and it is then deleted.

Important: Never store files in the .stage directory should it fail to be deleted at the end of an import operation. If present, the .stage directory is automatically deleted as the start of the next import operation is run.

Note: An alternate path for the staging directory can be set as an import option.

Originals

The _originals directory is populated when invoking the option to write metadata properties during import. For each asset or file that is modified, the orginal binary is copied into the _originals folder under a directory path reflecting the asset's library destination and scene name. The same process applies to sidecar files when invoking the option to include sidecar files, because sidecars must be modified to reflect the new filename of the asset they're associated with after being imported.

Consider a file named IMG001.CR2 that is modified and imported to the following library path:

2016/05/20160530T120936-D850S210000-7T18Q-00.CR2

In this example, the scene name is 20160530T120936-D850S210000-7T18Q. As such, a copy of the orginal file can be found as illustrated below:

library
├── _originals
│   └── 2016
│       └── 05
│           └── 20160530T120936-D850S210000-7T18Q
│               └── IMG001.CR2

Note that the orginal file also retains its orginal filename.

Note: The _originals folder is not a replacement for a proper backup solution. Only assets that are modified using the import write option are copied into the _originals folder, and a backup of the entire library should be set up and maintained on a separate drive using backup software.

Note: An alternate path for originals can be set as an import option.

Quarantined files

The _quarantine directory contains assets that cannot be imported due to insufficent metadata or metadata read errors, and are called indeterminate assets. Copying interminate assets to the quarantine folder makes it so they are not left behind at the source, where they could be forgotten about or overlooked. At the same time, it prevents them from contaminating the resident population with missinformation (by attempting to use unreliable information in place of metadata).

Because indeterminate assets lack the information to import, MediaKeg uses a different strategy for organizing quarantined assets (as compared to the _originals folder). Specifically, assets are copied into directories named with a hash of the following information:

Hostname
Source directory of quarantined asset

The layout structure is illustrated by the following example:

library
├── _quarantine
│   └── 5a3e126d05e75d202c3fa026a8195899
│       └── _source.html
│       └── seattle.png
│       └── avatar.jpg

Note: An alternate path for quarantined assets can be set as an import option.

The _source.html file provides the directory path information for the quarantined files. The host name accounts for shared media such as a file share or removable storage.

<pre>
{
  "host": "MacPro.local",
  "path": "/Volumes/Backups/Photos/2003"
}
</pre>

The _quarantine folder structure prevents deeply nested directory structures from forming under _quarantine, which could make working with quarantined files tedious.

After each import operation or at some regular interval, it's a good idea to check for quarantined files and fix or delete them, so they do not accumulate over time. This is especially true after deep scanning entire drives for assets to import. Deep scans often encounter images downloaded from the Internet, application generated preview files, and thumbnail images, where metadata is often stripped away for privacy purposes.

Note: Use the --minsize import option to help prevent low-resolution Internet files and cache data from being imported. Such files often end up in quarantine because they are indeterminate.

The absence of metadata informing the date and time of capture or creation is the most common reason for quarantining an asset. The quarantine folder makes it possible to efficiently discover such assets so that the user can fix the problem and attempt reimport. In such cases, the fix is to edit the metadata using a utility such as ExifTool. If the capture date and time cannot be set, then the fallback solution is to rename the file. By default, MediaKeg extracts this information from the filename provided it conforms to the MediaKeg filename convention.

Note: Setting the import --strictness option to level 4 (Brutal) disables extracting timestamp information from filenames.

Foreign assets and files

MediaKeg libraries are ordinary file system folders, and there is an expectation that users will work with library assets using other applications. In doing so, files will inevitably be added or created into the library structure through some means other than an import operation. Such files are called foreign assets or foreign files, depending on if multimedia assets or not.

Filename Convention

MediaKeg defines a filename convention for how imported media assets are named. The convention enables MediaKeg to operate without stored state (databases, catalog files, etc.), have high performance, and preserve asset affinity to scenes at a file system level (i.e., so all assets that are part of the same have the same filename except for subindex and file extension). The tradeoff is MediaKeg filenames are long, and there is reduced flexibility to customize how assets are named.

This section breaks down the parts of a fully qualified file path for an imported media asset, including filename convention. The following illustration of an example file path serves a visual guide to the remainder of this section:

Library Root

The library root is a directory containing a library configuration file (.keg) and is the target of an import operation. All library contents are expressed relative to the library root.

Collection

Imported media assets are organized into collections, which are subdirectories of the library root. The collection path is expressed relative to the library root, and the collection name bears the same name as the collection path. For example, the collection path and name from the above illustration is 2018/06.

The library layout template determines the collection path for each asset by expanding it with asset metadata (and user tag values where applicable). The default layout template organizes assets by year and month.

See Layout Templates for details on how to create custom layout templates.

Filename

Imported assets have filenames that conform to the naming convention covered by this section. The following terms are defined and applied to the convention:

Basename
File extension
Filename
Declarative part
Index
Device identifier
SubIndex
Scene name

Basename

The basename is the part of the fully qualified path following the last path separator, as highlighted using boldface in the following example:

/Volumes/Pictures/Library/2018/06/20180620T153205S120000-6CC58-01.JPG

Note: For the remainder of this section, the directory path leading up to the basename is excluded from the examples highlighting the various basename parts.

The term basename is used throughout the documentation only when needing to make the distinction between basename and filename. Since MediaKeg does not alter the file extension during the import process (except for letter case depending on library configuration), the documentation focuses primarily on the filename part.

File extension

The file extension is the suffix at the end of the basename, as delimited by a period. The file extension is an indication of a media type, and is also referred to as file type througout the documentation.

20180620T153205S120000-6CC58-01.JPG

Filename

Regular expression pattern:

^((?:[^\/\\.#%|<>?*":]*)((?:(?:18|19|20)(?:\d{2}))(?:0[1-9]|1[012])(?:0[1-9]|[12][0-9]|3[01]))(?:[^\/\\.#%|<>?*":]*)(?:T)((?:0\d|1\d|00|20|21|22|23)[0-5]\d[0-5]\d)(?:[^\/\\.#%|<>?*":]*)([S|M|F|C])(\d{6})-([A-Z0-9]{5}))-(\d{2,})(?:[^\/\\.#%|<>?*":]*)?$

The filename is the basename minus the file extension.

20180620T153205S120000-6CC58-01.JPG

The filename is also the concatenation of the timestamp, index, deviceId, and subindex parts.

Declarative Part

Regular expression pattern:

^(?:[^\/\\.#%|<>?*":]*)((?:(?:18|19|20)(?:\d{2}))(?:0[1-9]|1[012])(?:0[1-9]|[12][0-9]|3[01]))(?:[^\/\\.#%|<>?*":]*)(?:T)((?:0\d|1\d|00|20|21|22|23)[0-5]\d[0-5]\d)(?:[^\/\\.#%|<>?*":]*)$

The declarative part of the filename is expaned from the filename template, which includes the date and time an asset was captured or created in abbreviated IS08601 form. The default layout template includes just the date and time information as shown here:

20180620T153205S120000-6CC58-01.JPG

The filename template can be customized to include additional information. For example, the template can be modified to have all assets start with the letter P and include device make and model information, as shown here:

P20180620T153205-NIKON-D850S120000-6CC58-01.JPG

See Filename Templates for more information.

Index

Regular expression pattern:

^([S|M|F|C])(\d{6})$

The file index (or index) prevents filename collisions for assets captured from the same device at subsecond intervals. If the asset contains metadata providing subsecond time information then this value is used for the index by default. If this information is not available, then MediaKeg attempts find a suitable property fulfilling a similar role, such as shutter count or file number.

The file index is a seven character sequence consisting of a prefix and six digits. The prefix informs the index source and the digits represent the time or count value. Values less than six digits in length are zero padded. Time values are left and right padded and count values are are left padded. Values greater than six digits are truncated from the right.

The following example shows a filename having an index value of S120000:

20180620T153205S120000-6CC58-01.JPG

The S prefix indicates that the index is a subsecond time value and the time value is 120 milliseconds (see below for explanation).

The following table lists index prefixes in rank preference order and their meanings:

Prefix	Rank	Source
S \| M	0	Custom list of ranked sources specified by `settings.index.tags`.
S	1	The asset capture subsecond time value.
M	2	Default list of ranked metadata source properties.
F	3	Parsed from the asset source filename.
C	4	Import session counter.

MediaKeg seeks interval values using the sources listed, in rank order from lowest to highest. The source value must be available and numeric (0-9) else it is skipped, and processing continues to the next item in the list. The metadata sources (M prefix) are ranked sub-lists of metadata tags serving as providers of interval data.

S-value

The index source is metadata indicating the subsecond time value for when an asset was captured or created.

Note: Most cameras that include subsecond time values in metadata do so at centisecond (2 decimal places) or millisecond (3 decimal places) resolution. The value shown is to the right of the decimal, where 100000 represents 100 milliseconds, and 001000 represents 1 millisecond. Subsecond time values are padded from both ends to ensure the sort order reflects the correct sequence, and then to fill out the 6-digit sequence.

M-value

The index source is a metadata property value other than subsecond time. The property selected is the first to have a numeric value from a ranked list of metadata tags. The default list can be customized using the settings.index.tags property in the library configuration.

See settings.index.tags for information on setting and listing the index tags.

F-value

The index is parsed from the asset source filename if it contains a numeric sequence matching one of two patterns.

If the filename follows the filename convention described herein, then the index already encoded in the filename is selected. This situation can occur if the import asset belongs to the same scene as a resident asset, or if the source is from another library. If the scene includes a resident asset, then its index value is selected.

If the filename does not follow convention, then it must contain a sequence of 3 to 5 consecutive digits, inclusive (expressed as [3, 5]). It is common for digital cameras to index images this using a [3, 5] sequence. If there are multiple matches, then the most frequently occurring match is selected.

Note: A minimum sequence of 3 digits helps prevent file copy subindex values from being selected (e.g., DSC001 Copy 1.JPG). A maximum of 5 is required to avoid date and time encoded values from being selected.

The table below provides exampes of indicies parsed from asset filenames belonging to the same scene. An empty cell indicates a match could not be found and the scene will be assigned an auto-incremented I-value, as described below.

Scene Asset Filenames	Index	Comments
IMG_012345	F012345
IMG_0012	F000012
IMG_00012-Copy 04	F000012
IMG_12		No numeric sequence [3, 5]
IMG_123456		No numeric sequence [3, 5]
IMG_089_123456	F000089
IMG_0012, IMG_0012-Copy 1, Wedding1234	F000012	More occurrences of 0012

C-value

The index is set to the scene idenifier, which an an auto-incremented counter for each new scene. The counter is reset at the start of each import operation.

Device identifier

Regular expression pattern:

^([A-Z0-9]{5})$

The device identifier (deviceId) helps to uniquely identify a media asset in time. The deviceId is a 5-character, base-36 encoded value derived from available device make, model, and serial number information.

Note: The virtualized tag properties *Make, *Model, and *SerialNumber are used for make, model, and serial number information, respectively.

A dash separator always precedes the device identifier.

The following example shows a filename having a device identifier of S120000:

20180620T153205S120000-6CC58-01.JPG

MediaKeg generates a deviceId using all available information. If one of the above tag properties is not available, then a deviceId is still generated, but the chance of another device having the same identfier is much higher than if all three properties are available. If no properties are available then the deviceId is set to 00000.

Note: The probability of two different devices having the same deviceId is highly improbable if *Make, *Model, and *SerialNumber are available for either device.

Note: The device identifier is derived from a hash of device indentifying information (make, model, and serial number), which is not reversible and therfore should not be a privacy concern. MediaKeg libaries also set settings.salt to hash input string with a user-defined value, which results in a different device identifier being generated for the same device information.

Subindex

Regular expression pattern:

^(\d{2,})(?:[^\/\\.#%|<>?*":]*)?$

The subindex is exists to resolve filename conflicts when two or more assets belonging to a scene have the same file extension. The subindex has with two leading digits followed by zero (0) ore more characters, which may also include spaces. A dash separator always precedes the subindex.

The following example shows a filename having subindex of 01:

20180620T153205S120000-6CC58-01.JPG

The flilename always includes a subindex, even in the absence of filename conflicts. In the event of a filename conflict, the importer auto-increments the subindex value until a free slot is found. If a sidecar is paired to the asset, then the sidecar target name is also considered when seeking a free slot.

The subindex pattern provides freedom for the user and external applications to copy library assets without violating the filename convention. The following list illustrates subindices for multiple assets having the same scene name and file extension:

20180620T153205S120000-6CC58-00.JPG
20180620T153205S120000-6CC58-01.JPG
20180620T153205S120000-6CC58-01-01.JPG
20180620T153205S120000-6CC58-01 Copy 1.JPG

The last two entries from this list were created by the user or an external application because MediaKeg always sets a numeric value.

Scene name

Regular expression pattern:

^(?:[^\/\\.#%|<>?*":]*)((?:(?:18|19|20)(?:\d{2}))(?:0[1-9]|1[012])(?:0[1-9]|[12][0-9]|3[01]))(?:[^\/\\.#%|<>?*":]*)(?:T)((?:0\d|1\d|00|20|21|22|23)[0-5]\d[0-5]\d)(?:[^\/\\.#%|<>?*":]*)([S|M|F|C])(\d{6})-([A-Z0-9]{5})$

The scene name is the basename minus the subindex and file extension. All assets that originate from the same scene also share the same scene name.

Note: The scene is also the concatenation of the declarative, index, and deviceId parts.

The following example shows a filename having scene name of 20180620T153205S120000-6CC58:

20180620T153205S120000-6CC58-01.JPG

Library Templates

Library templates (or templates) are user-customizable strings that define how assets are organized and named in a library.

The layout template determines how assets are organized.
The filename template determines how assets are named.

The library configuration (.keg) file contains the template declarations.

Tokens

Library templates contain token parameters (or tokens), which are replaced by arguments in a process called expanding the template. The templates are expended for each scene using asset metadata and optional user tag values.

A token is delimited by a tag pair consisting of a start tag and an end tag. The general format is as follows:

<start-tag token end-tag>

The token inclusion rules are as follows:

Token names are case-insensitive.
Token names must be alphanumeric and contiguous (i.e., a single word with no special characters).
Spaces between token and delimiters are allowed for readability, but not required.
A token may appear only once per template.

The tag delimiters are specific to each of the three (3) token types:

Timestamp tokens
Metadata tokens
User tokens

Each token type is discussed below.

Tokens have one of the following behaviors:

Static tokens
Dynamic tokens

A static token always receives a substitution value. If the mapped property is unavailable or if the property value is not set then a default value is used in its place. Default values are specified as part of the library configuration file, and are required for each static token referenced by the library templates.

See Static Token Defaults for details on setting defaults.

A dynamic token is dropped if the substitution value is unavailable.

Token Delimiters

Token delimiters are string parsable entities consisting of start and end tags specific to each token type and behavior, as indicated by the following table:

Token	Start Tag	End Tag	Behavior	Comments
Timestamp	<@=	@>	Static	Dynamic tags not supported
Metadata	<%=	%>	Static
Metadata	<%?	%>	Dynamic
User	<&=	&>	Static
User	<&?	&>	Dynamic

Static Fill

Templates can also include static fill, which is any text external to tag delimited tokens. Except as noted below, static fill is transferred as-is to the expanded template output (meaning the static fill is not replaced by metadata or user tag value).

Must be alphanumeric, except as noted below.
Hyphens (-) are also allowed.
Must not contain spaces.

Hyphens receive special handling as follows:

Two or more adjacent hyphens are reduced to a single hyphen.
Hyphens are trimmed from the start and end of the expanded output.

This special handling helps prevent extraneous hyphens when used alongside dynamic tokens, which may not receive a substitution value.

The following example shows a filename template state starts with the letter P:

P<@=*date@><@=*time@>

In this example, the letter P is undelimited (i.e., all by itself), which causes MediaKeg to treat it as static fill. The result is that all imported media assets will have filenames starting with the letter P, as shown in the following example:

P20180620T153205S120000-6CC58-01.JPG

Although this example inserts static fill at the start of the template string, fill can be added anywhere in the template.

Timestamp Tokens

Timestamp tokens are are used or organize and name assets according to the date and time they were originally captured or created.

Timestamp tokens must be listed in rank order (where used), as indicated in the table below.
Timestamp tokens need not be adjecent, meaning they can be interleaved with other token types.
Filename templates must include *DATE and *TIME tokens.

It's recommended that layout templates contain one or more timestamp tokens to prevent collections from becoming excessively large.

Timestamp tokens represent the date and time parts corresponding to the *Created virtual tag, as outlined in the following table:

Timestamp Part	Rank(s)	Information Contained	Expanded Value (Example)
*DATE	1, 2, 3	Year, Month, Day	20180212
*YEAR	1	Year	2018
*MONTH	2	Month	02
*DAY	3	Day	12
*TIME	4, 5, 6	Hour, Minute, Second	T143205
*HOUR	4	Hour	14
*MINUTE	5	Minute	32
*SECOND	6	Second	05

Metadata Tokens

Metadata tokens map to metadata tags by the same name as the following table illustrates:

Metadata Token	Metadata Tag	Expanded Value Example
Artist	Artist	Ansel Adams
Make	Make	Canon
Model	Model	EOS 5R

Virtual Metadata Tokens

Virtual metadata tokens (or virtual tokens) are metadata tokens that map to virtual metadata tags. Except for this distinction, vitual tokens behave like real metadata tokens.

The following table lists the virtual tokens that library templates support:

Virtual Token Name	Virtul Tag	Comment
*Make	*Make	Device manufacturer name.
*MediaType	*MediaType	See mediaTypeTransforms setting.
*Model	*Model	Device model name.
*SerialNumber	*SerialNumber

Note: Use timestamp tokens for tag values correponding to the *Created virtual tag.

Note: Using real metadata tokens (i.e., Make, Model, SerialNumber) in place of the corresponding virtual tokens is supported but not recommended, because virtual tokens have a higher chance of being backed by metadata.

Application Notes

For a list of known tags, enter the following command using the MediaKeg CLI:

$ mkeg list-tags

For known metadata tags, metadata tokens are case-insensitive. For all other metadata tags, metadata tokens should be entered in proper case.

Note: The list-tags command contains several options for filtering the output, including by tag name search pattern and tag category. Refer to the documentation for usage details.

When customizing library templates using metadata tokens, consider that the tags set on an asset vary according to capture device make and model. In general, try to limit usage to the most common tags to help ensure that replacement values are always or usually available. For static tokens, a default value is assigned if the tag value is unavailable.

To see if an asset contains a tag, enter the follwing CLI command:

$ mkeg tags <path-to-file>

User Tokens

User tokens map to user tags, which are provided as inputs to the import process alongside their respective values. User tokens can be static or dynamic, but declaring them as dynamic is generally the best option so they are dormant unless activated via user input.

Consider the following layout template and an import operation for asset captured in June 2019:

    "layout": {
      "template": "<&?Category&>#<@=year@>#<&?Event&>#<@=month@>"
    },

The following examples illustrate how a dynamic user tokens are activated via the MediaKeg CLI:

Example 1: No activation

$ mkeg import /Volumes/DCIM

The layout template expands to:

  2019/06

The dynamic tokens are dormant because the import command contains no user tag option values.

Note: The raw template substitution for this example yields #2019##06. The adjacent hashtags (##) appear because the Event tag is dormant. MediaKeg collapses adjacent hashtags into a single hashtag prior to expanding the file system path separators.

Example 2: Partial activation

$ mkeg import /u:category=Racing /Volumes/DCIM

The layout template expands to:

  RACING/2019/06

Note: This example assumes the default template lettercase settings, which is for token values to expand to uppercase.

Example 3: Full activation

The following example illustrates how a user token is activated via the MediaKeg CLI:

$ mkeg import /u:category=Racing /u:event="24 Hours of Le Mans" /Volumes/DCIM

The layout template expands to:

  RACING/2019/24HOURSOFLEMANS/06

Note: This example assumes the default template format settings, which is for expanded token values to contain only alphanumeric characters and no whitespace.

Layout Templates

Layout templates determine how assets are organized or, more precisely, how collection paths are expanded from tokens and static fill. The hashtag (#) is a special character specific to layout templates, which is replaced by the path the platform-specific path segement separtor:

/ on Linux and macOS
\ on Windows

The following table provides layout template examples and expanded collection paths:

Layout Template	Collection Path	Comments
<@=year@>#<@=month@>	2018/02	Default Template
<%?artist%>#<@=year@>#<@=month@>	Ansel Adams/2018/02	Dynamic behavior
<@=year@>#<%=make%>#<@=month@>	2018/Canikon/02	Static behavior
<&?category>#<@=year@>#<@=month@>	Racing/2018/02	Dynamic behavior

Note: Adjacent or dangling path separators are reduced or trimmed from the expanded template value. This case can occur if one or more substitution values are unavailable for dynamic tokens. For example, if the category information is unavailable in the example provided above, then the template expands to 2018/02 instead of /2018/02.

Filename Templates

Filename templates determine how assets are named by expanding the declarative filename part. The following table provides examples of filename templates expanded output:

Filename Template	Filename	Comments
<@=date@><@=time@>	20190315T212520S120000-0SPTS-00.NEF	Default Template
P<@=date@><@=time@>-<%=make%>-<%=model%>	P20190315T212520S120000-NIKON-D850-0SPTS-00.NEF

Note: The filename template only determines the declarative part of the filename. The index, device identifier, and subindex parts follow the declartive part, all delimited by hyphens (-), to form the complete filename.

Static Defaults

Static tokens always expand to a value. If the token substitution information is unavailable, then a default value is substituted when expanding a template. The token default can defined inline, as part of the template declaration, or using a library configuration setting, as shown below.

Note: Setting defaults is recommended, but not required. If a token default is not set, then Unknown is used.

The token default can be defined inline with the token using the following syntax:

<token>:[default]

The following table illustrates how to set a token default inline with the template declaration:

Example	Default Value	Comments
<&=category&>	Uknown	Default value not specified.
<&=category:Racing&>	Racing
<%=artist:Ansel Adams%>	Ansel Adams
<&?category:Racing&>	N/A	Default values do not apply to dynamic tokens.
<@=date:20171203@>	N/A	Default values do not apply to timestamp tokens.

Template defaults can alternatively be set as part of the template configuration, as the following example illustrates for a layout template:

{
    "templates": {
        "layout": {
            "template": "<&=category&><@=year@>#<@=month@>",
            "defaults": {
                "category": "Racing"
            }
        }
    }
}

Which method to use in setting a default is a matter of personal peference. Using a combination of both methods is also allowed. The inline method is chosen a default is assigned to the same token using both methods.

See Template Options for more information.

Template Options

The following options are available to layout and filename templates:

Defaults
Format
Lettercase
Maximum expanded token length (maxlen)
Template

The options are settable for each template as part of the library configuration. The following shows how to apply options under the templates section for a library configuration:

    "templates": {
        "layout": {
            "template": "<%=make%>#<%?model%>#<@=*year@>#<@=*month@>",
            "defaults": {
                "make": "Unspecified"
            }
        },
        "filename": {
      "template": "<%=serialnumber:00000%>-<@=*date@><@=*time@>",
            "format": "packed",
            "lettercase": "lower",
            "maxlen": 12
        }
    }

See Library Configuration for a description of each option and their default values. If the default value is acceptable, then there is no need to set it in the configuration file explicitly.

Note: The above settings snippet illustrates the two different methods of setting static token defaults. The layout template uses the defaults option, where the defaults are entered as key-value pairs. The filename template uses the inline method, where a colon (:) delimits the token from its default value.

Library Configuration

A library is any directory containing a .keg configuration file at its root. The configuration file contains a small amount of JSON data conforming to the MediaKeg Library Schema. Except for optional library metadata, the configuration is static and should not be modified after the first import operation. The optional metadata fields can be changed at any time because they do not have a role in import operations.

The schema defines defaults for the required properties. The following JSON shows a minimal configuration, which is the default configuration:

{
  "doctype": "https://mkeg.io/schemas/document/library-1-0-0.json",
  "identity": "00000000-0000-0000-0000-000000000000"
}

Note: An actual configuration file must contain a non-empty UUID identity value.

The following JSON shows a more declarative configuration that includes metadata and library settings:

{
  "doctype": "https://mkeg.io/schemas/document/library-1-0-0.json",
  "identity": "00000000-0000-0000-0000-000000000000",
  "metadata": {
    "maker": {
      "app": "MediaKeg CLI (mkeg)",
      "appver": "1.0.0",
      "created": "2019-08-03T23:40:22-07:00",
      "username": "username",
      "hostname": "hostname"
    },
    "name": "Library name",
    "desciption": "Library description",
    "owner": "Library owner name",
    "artist": "Artist name",
    "copyright": "Copyright info"
  },
  "settings": {
    "extension": {
      "lettercase": "uppercase"
    },
    "salt": ""
  },
  "templates": {
    "layout": {
      "template": "<@=*year@>#<@=*month@>",
      "format": "alphanumeric",
      "lettercase": "uppercase",
      "maxlen": 16
    },
    "filename": {
      "template": "<@=*date@><@=*time@>",
      "format": "alphanumeric",
      "lettercase": "uppercase",
      "maxlen": 16
    }
  }
}

The remainder of this section details the the library metadata and settings properties, which are listed in JSON dot notation:

A required property with default assumes the default value if not set in the configuration file.
A required properties without a default must be set in the configuration file.

doctype

Required Property

Identifies the document as a library configuration file and the schema version. This property must be set to the following value:

https://mkeg.io/schemas/document/library-1-0-0.json

This property has no default value and must be present, else the document will fail to load.

identity

Required Property

Sets the library identity, which is a Universally Unique Identifier (UUID). The UUID must be formatted as follows:

xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx

The 4 bits of digit M indicate the UUID version, and the 1–3 most significant bits of digit N indicate the UUID variant. To help ensure anonymity, MediaKeg uses UUID v4 when making a new configuration file. The UUID version cannot be attested to if the configuration file is made using a different method.

This property has no default value and must be present, else the document will fail to load. The value must also be a non-empty UUID, meaning it cannot contain all zeros.

metadata

Optional Object

This object contains several settable properties listed below. They provide descriptive information about the library and have no functional role in the import process. All metadata properties are therefore optional, including the metadata object itself.

MediaKeg uses library metadata to provide richer information about a target library in reporting and library maintenance commands.

metadata.artist

Optional Property

The name of the artist who created the library content.

metadata.copyright

Optional Property

The copyright notice for the library content.

metadata.desciption

Optional Property

The library desciption.

metadata.maker

Optional Object

This object contains several settable properties listed below. The properties inform details relevant to how the configuration file was made.