75 lines
2.9 KiB
Markdown
75 lines
2.9 KiB
Markdown
# Tube-Archivist Scripts
|
|
|
|
Small collection of Bash helpers used to prepare offline / archived YouTube videos for import into TubeArchivist. Written for Debian-like systems; should work in other Linux distributions with Bash and standard GNU utilities.
|
|
|
|
---
|
|
|
|
## Goal
|
|
Normalize filenames and create accompanying metadata (.info.json) so TubeArchivist can ingest local archives (especially those from archive.org or other offline sources).
|
|
|
|
Example input filename:
|
|
`20170311 (5XtCZ1Fa9ag) Terry A Davis Live Stream.mp4`
|
|
|
|
Resulting filename and sidecar JSON:
|
|
- `20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].mp4`
|
|
- `20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].info.json`
|
|
|
|
---
|
|
|
|
## How it works / Usage
|
|
1. Put all the scripts in the directory with your video files (scripts currently do not recurse into subdirectories).
|
|
2. Edit 'Example.info.json'
|
|
Update these lines
|
|
- "channel_id": "Change To Channel ID/username",
|
|
- "uploader": "Youtube Username",
|
|
- "uploader_id": "Change To Channel ID",
|
|
- "uploader_url": "https://www.youtube.com/channel/ChangeToChannelID-or-username",
|
|
3. Run the scripts in order from the directory containing your media below:
|
|
|
|
Each script performs a single transformation so you can inspect results between steps.
|
|
|
|
## Scripts (order and purpose)
|
|
1a. `convert-()-to-[].bash`
|
|
- Replace parentheses containing an ID with square brackets (e.g. `(ID)` -> `[ID]`) and clean spacing.
|
|
- If already have id at end skip to 3.
|
|
|
|
1b. `move-find-id-to-end-filename.bash`
|
|
- Split filename into parts. Find id between second and third " - " without brackets, adds backets, moves [id] to end of filename before extension.
|
|
- Skip 1a/2a, straight to 3.
|
|
|
|
2a. `move-[id]-to-end-filename.bash`
|
|
- Ensure the video ID appears at the end of the filename inside square brackets.
|
|
|
|
3. `create-json-alongside-each-file.bash`
|
|
- Create an empty `.info.json` file for each video filename (sidecar).
|
|
|
|
4. `insert-id-into-json.bash`
|
|
- Populate the sidecar JSON with the video ID field.
|
|
|
|
5. `insert-title-into-json.bash`
|
|
- Insert the cleaned title into the sidecar JSON.
|
|
|
|
6. `insert-date-into-json.bash`
|
|
- Insert the date (if available) into the sidecar JSON.
|
|
|
|
---
|
|
|
|
## Notes and tips
|
|
- Scripts do not process subdirectories. Run at the directory root for each archive.
|
|
- Always test on a copy or run a subset first to confirm behavior.
|
|
- If filenames contain unusual characters, run a quick grep for non-ASCII prior to processing.
|
|
- Modify scripts to add dry-run mode if you want safer previews.
|
|
- ElasticSearch Common Commands for updates: https://gitea.rcs1.top/sickprodigy/TubeArchivist-Scripts/src/branch/main/ElasticSearch-Common-Commands.md
|
|
|
|
---
|
|
|
|
## Example archive
|
|
Archive used for testing:
|
|
`https://archive.org/details/TempleOS-TheMissingVideos`
|
|
|
|
Processed example (after running full pipeline):
|
|
`20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].mp4`
|
|
`20170311 Terry A Davis Live Stream [5XtCZ1Fa9ag].info.json`
|
|
|
|
---
|