How to compute the total size of files included by tar after exclude flags?

2 min read 27-10-2024
How to compute the total size of files included by tar after exclude flags?

When working with file archiving using tar, it’s common to exclude certain files or directories. However, understanding the total size of the included files can be a challenge. In this article, we’ll explore how to compute the total size of files included by tar after applying exclude flags.

Understanding the Problem

The original challenge can be articulated as follows: How do you compute the total size of the files that are archived by tar after excluding certain files using exclude flags?

Original Code Example

Here's a simple example of how you might use tar with exclude flags:

tar -cvf archive.tar --exclude='*.tmp' --exclude='backup/' /path/to/source

In this command, we create an archive named archive.tar, excluding all files with the .tmp extension and the backup directory within /path/to/source.

Steps to Compute Total Size of Included Files

To calculate the size of files included in a tar archive after excluding specified files, follow these steps:

  1. List the Archive Contents: Use the tar -tf command to list the contents of your archive. This will give you an overview of the files included.

    tar -tf archive.tar
    
  2. Extract File Sizes: Combine the output from the previous command with the wc (word count) or du (disk usage) command to get the sizes of these files.

    To do this efficiently, you can use the following command pipeline:

    tar -tf archive.tar | xargs -I {} du -ch {} | grep total$
    

    Here, du -ch computes the total size in a human-readable format, and grep total$ filters the output to show only the total size.

Analyzing the Output

  • The tar -tf command lists every file that’s included in the archive after excluding the specified ones.
  • The du command calculates the size for each of these files.
  • The grep command extracts just the total size at the end, making the output clear and concise.

Example

Let’s say you have the following directory structure:

/path/to/source
├── file1.txt
├── file2.tmp
└── backup
    └── file3.log

When you run the command:

tar -cvf archive.tar --exclude='*.tmp' --exclude='backup/' /path/to/source

You are including file1.txt while excluding file2.tmp and all contents within the backup directory.

After running the pipeline to calculate sizes:

tar -tf archive.tar | xargs -I {} du -ch {} | grep total$

You would get output similar to:

10K total

This indicates that the total size of file1.txt included in archive.tar is 10K.

Additional Considerations

When using the above method, keep in mind:

  • Special Files: If your archive contains special files (like device files), du may not provide accurate sizes.
  • Symbolic Links: The size of symbolic links may not represent the size of the target files.
  • Performance: For large archives or directories, the process might take some time, and you might want to consider parallel processing techniques.

Useful Resources

Conclusion

Computing the total size of files included in a tar archive after applying exclude flags can be done efficiently with simple command-line tools. By leveraging tar, du, and grep, you can easily obtain a clear understanding of the space your files occupy. Whether you are managing backups or compressing files for distribution, understanding these commands will enhance your command-line proficiency.

Feel free to experiment with the commands and modify them according to your specific needs for file archiving.