Skip to content
| Marketplace
Sign in
Visual Studio Code>Other>Fylex - Linux File Deduplication ToolkitNew to Visual Studio Code? Get it now.
Fylex - Linux File Deduplication Toolkit

Fylex - Linux File Deduplication Toolkit

Sivaprasad Murali

|
1 install
| (0) | Free
Smart Linux toolkit for file deduplication and conflict-free file management, powered by Fylex.
Installation
Launch VS Code Quick Open (Ctrl+P), paste the following command, and press enter.
Copied to clipboard
More Info

Fylex: Linux file deduplication toolkit

PyPI Python 3.x PyPI Downloads License: MIT Socket Badge

Fylex is a production-ready, Linux-tailored file management tool that combines the best of rsync, cp, and Python’s shutil — but goes beyond with:

  • Smart Copy & Move with hashing (xxhash, blake3, SHA, MD5)
  • Advanced conflict resolution (rename, skip, replace, larger/smaller, newer/older, prompt)
  • Filters: regex, glob, exact filename matches, inclusion/exclusion
  • Safety nets: undo, redo, backup of deprecated files
  • Data integrity: hash verification, SQLite-backed hash cache for deduplication
  • Metadata preservation: permissions, timestamps, xattrs, ACLs (Linux-specific)
  • CLI & Python API for flexible usage

What’s New

  • Bug Fixes

    • In versions ≤ 1.2.2, duplicates within the source directory were not detected when verification was disabled.
    • In versions ≥ 1.2.3, duplicates within the source directory are correctly detected and skipped, preventing them from being copied or moved unnecessarily.
  • Changes

    • JSON files are now stored along with the database files in the .fylex folder inside the user’s home directory.

Feature comparison

Feature / Tool Fylex cp (coreutils) rsync shutil (Python stdlib)
Primary purpose Smart local copy/move with safety nets Basic copy Fast sync (local/remote) Library-level file ops
Undo / Redo Yes — built-in JSON journaling No No No
Hash verification Yes — xxhash, blake3, sha256, etc. No Partial — checksums optional No
Hash cache (SQLite) Yes — avoids rehashing unchanged files No No No
Duplicate detection (dest) Yes — size + hash No Partial — based on size/checksums No
Conflict resolution Extensive — rename, replace, skip, newer/older, larger/smaller, prompt None — overwrite only Limited — flags like --backup, --suffix None
Metadata preservation Yes — mtime, perms, xattrs, ACLs on Linux Partial — -a preserves many Partial — -a preserves many Partial — copystat only
Atomic writes Yes — via fylex.tmp No Partial — temp options exist No
Logging / audit trail Yes — JSON logs per process No Partial — verbose logs only No
CLI + Python API Yes — both CLI only CLI only (bindings exist) Python API only
Delta transfer (network) No — local only No Yes No
Remote / cloud support No — local-first No Yes — ssh/rsyncd No
Cross-platform Partial — Linux-first (xattrs/ACL best) Yes Yes Yes
Performance (local) Very good — uses copy_file_range / sendfile Good Very good — efficient I/O Moderate
Learning curve Moderate — many options Very low Moderate to high — many options Low
Best fit Local integrity-critical workflows, reversible ops Quick one-off copies Local/remote sync and bandwidth-efficient backups Small Python scripts

Strengths

  • Undo / Redo — Reversible operations by process ID.
  • JSON audit trail — Logs stored per PID for reproducibility.
  • Hash verification + cache — Prevents rehashing unchanged files.
  • Conflict resolution — Multiple real-world strategies (rename, replace, skip, larger/smaller, newer/older, prompt).
  • Linux metadata handling — Preserves xattrs/ACLs.
  • Atomic writes & backups — Prevents partial corruption.
  • Good performance — Uses copy_file_range/sendfile.

Safety Nets

  • Undo: Rollback any process (undo(pid)).
  • Redo: Replay exactly (redo(pid)).
  • Backups: Deprecated files → fylex.deprecated/{pid}/.
  • Logs: JSON + JSONL under json/{pid}.json.
  • Verification: Optional hash verification (--verify).
  • Retries: Up to 5 retries on hash mismatch.
  • Protections: Prevents unsafe recursive/self copies.

Installation

pip install fylex

Requires Python 3.8+. Linux recommended (for full xattr/ACL support).


CLI Usage

Copy

fylex copy ~/Downloads ~/Backup --resolve rename --algo xxhash --verify --verbose

Move

fylex move ./data ./archive --resolve newer --match-glob "*.csv"

Undo / Redo

fylex undo 1002
fylex redo 1002

Python API

from fylex import filecopy, filemove, undo, redo

# Copy with conflict resolution
pid = filecopy("photos", "backup/photos", resolve="newer", match_glob="*.png", verify=True)

# Undo
undo(pid)

# Move and undo in one line
undo(filemove("docs", "docs_archive", resolve="rename"))

Function Reference

filecopy(src, dest, ...)

Description: Smartly copies files from src to dest with conflict handling, filters, and safety nets.

Param Type Default Description
src str\|Path required Source file or directory
dest str\|Path required Destination directory
resolve str "rename" Conflict strategy: rename, replace, skip, larger, smaller, newer, older, prompt
algo str "xxhash" Hash algorithm: xxhash, blake3, md5, sha256, sha512
chunk_size int 16 * 1024 * 1024 Buffer size (bytes) for reading files
verbose bool True Log operations to stdout
dry_run bool False Simulate actions without making changes
summary str\|Path None Path to copy fylex.log summary
match_regex str None Regex pattern to include files
match_names list[str] None Exact filenames to include
match_glob list[str] None Glob patterns to include
exclude_regex str None Regex pattern to exclude files
exclude_names list[str] None Exact filenames to exclude
exclude_glob list[str] None Glob patterns to exclude
recursive_check bool False Deduplication check recursively in dest
verify bool False Verify file hashes after copying
has_extension bool False Include file extension in deduplication check
no_create bool False Do not create dest if it does not exist
preserve_meta bool True Preserve timestamps, permissions, xattrs, ACLs
backup str\|Path "fylex.deprecated" Folder for deprecated or conflicting files
recurse bool False Traverse subdirectories in src

Example:

filecopy("photos", "photos_backup", resolve="newer", match_glob="*.png", verify=True)

filemove(src, dest, ...)

Same params as filecopy, but moves files instead. If conflicts exist, originals are moved into deprecated folders within src or dest depending on the origin of the file being deprecated.

undo(p_id, verbose=True, force=False)

Rollback a process by ID.

Param Type Description
p_id str Process ID (JSON log ID)
verbose bool Enable logs
force bool Continue undo even if some entries fail
summary str \| Path Path to copy fylex.log summary
dry_run bool Dry run feature to simulate operations

redo(p_id, verbose=True, force=False)

Replay a process by ID. Same parameters as undo.


Migration Notes

  • Old behavior:

    success = filecopy("a", "b")  # returns True/False
    
  • New behavior (>=v1.2.0):

    pid = filecopy("a", "b")      # returns process ID
    undo(pid)                     # reversible
    

Update your code to capture process IDs instead of expecting booleans.


Example Workflows

Daily backup with rollback

fylex copy ~/work ~/backup --resolve newer --verify
# Oops
fylex undo 2023

Reproducible replay

fylex redo 2023

Direct chaining in Python

fx.undo(fx.filemove("src/data", "archive/data"))

License

MIT © 2025 Sivaprasad Murali


✨ With Fylex, file management on Linux is no longer just copying and moving — it’s safe, verifiable, reversible, and smart.


  • Contact us
  • Jobs
  • Privacy
  • Manage cookies
  • Terms of use
  • Trademarks
© 2025 Microsoft