AEM has a built in live backup mechanism which works rather well at creating an extractable archive of the complete instance it can take an extremely long and unmanageable amount of time due to the nature of what all it is creating an archive of. By default this backup will consist of the original CQ jar file and everything at an equal level and anything contained in any child directories of that path. While this works, it's not ideal as a long term solution due to the size in which one can expect the environment to grow. The datastore will be the largest section within this backup and will consist entirely of binary or text "content" that will never change. Modifications and new files will generate new items in the datastore and thus nothing is ever deleted from this path unless datastore garbage collection is run. For this reason the datastore can be treated as a necessity of the AEM instance but may be treated separately in terms of backups. For this reason it is suggested to relocate the datastore from within the AEM path structure and not be included as part of the AEM online backup archive. By doing this we will decrease the time taken to perform a backup and additionally require less space to perform the backup process. This document is designed to describe the configuration steps to follow to move the datastore into a new location and how backups should then be performed.
Moving the Datastore
- In order to move the datastore we will need the AEM instance to be stopped.
Once the instance is stopped simply navigate to
Move the directory named "datastore" to a new location. Ideally this would be to a large disk. If clustering and using shared datastore it would be moved to a shared drive. The location however is not important in terms of AEM as long as it can access it. The easiest thing to do is to move it to be equal to the AEM directory but not inside it.
For example if your AEM path is /opt/AEM/author/crx-quickstart then your default datastore path is /opt/AEM/author/crx-quickstart/repository/repository/datastore. So moving it would be:
`mv /opt/AEM/author/crx-quickstart/repository/repository/datastore /opt/AEM/author-datastore`
The name of the repository directory is not hugely important we will specify the name in the configuration. Depending on how the environment is setup an identifiable naming convention may be necessary when running multiple AEM instances on the same machine.
Once the datastore directory has been moved edit the repository.xml file located in "/crx-quickstart/repository/" as shown below. This will tell AEM to look elsewhere for the datastore items rather than the default path.
- Start AEM. The datastore's path should have now been changed to a new location.
Repository backups can continue to be performed using the OOTB hot backup mechanism provided in AEM. The only thing that we have changed is that now the datastore will no longer be included in the resulting backup archive.
- As mentioned the repository backup procedure remains unchanged. Creating a cron job to execute the necessary curl command will adequately create a consistent and restorable repository backup archive.
A sample command would look like:
`curl -u username:password -X POST http://localhost:4504/system/console/jmx/com.adobe.granite:type=Repository/op/startBackup/java.lang.String?target=repositoryBackup.zip`
The datastore being static content on the file system can be treated as such in regards to backups. Since they are not changing files they are already in a consistent state and therefore any system backup strategies (full, or incremental) may be used on a live environment for the datastore as long as the datastore backup is taken any time after the repository backup is completed. This means the datastore may contain objects that aren't referenced in the archived repository but this will not pose a problem as the unreferenced objects can be removed by datastore garbage collection later.