BaseNameFolderAssignment

The purpose of this post and script is to consolidate files into folders by the basename. In bioinformatics, I encounter situations where programs have created numerous files from an input with different extensions. When running multiple jobs these files become cluttered and difficult to move into individual folders. The script can be found at my Github HERE

Basename Folder Creation and Assignment

The purpose of this post and script is to consolidate files into folders by the basename. In bioinformatics, I encounter situations where programs have created numerous files from an input with different extensions. When running multiple jobs these files become cluttered and difficult to move into individual folders. For example, consider the following directory structure below
  • Directory
    • bar.csv
    • foo.csv
    • bar.tar.gz
    • foo.tar.gz
    • bar.txt
    • foo.txt
The output of this script would convert the directory above into individually labeled folders
  • Directory
    • bar
      • bar.csv
      • bar.tar.gz
      • bar.txt
    • foo
      • foo.csv
      • foo.tar.gz
      • foo.txt
In [1]:
## Import libraries
import glob  ## Read linux paths
import os    ## Perform linux commands
import sys   ## Used in standalone script for reading user input
import re    ## Used to ignore directories in file list
In [2]:
## Input Directory
name = 'DemoData/'

## Save file names as a list (Change Folder_with_Combined directory to your input directory)
y = glob.glob('DemoData/*')

## Remove directories from search
y = [x for x in y if re.search(r'\.', x)]

## Retain file name w/ extension (i.e. basename.txt)
basename = [x.split("/")[-1] for x in y]

## Remove all extensions
removed_extensions = [x.split(".")[0] for x in basename]
directories = list(set(removed_extensions))

## Create new directories (Check if directory already exists)
for f in directories:
    try:
        os.mkdir(name+f)
    except:
        print(str(f) + " directory already exists")
        pass
In [3]:
for item in range(len(basename)):
    new_names = name+ removed_extensions[item] + '/' + basename[item]
    print(new_names)
    os.rename(y[item], new_names)
DemoData/NTC-lysis_L001_R2_001/NTC-lysis_L001_R2_001.fasta
DemoData/NTC-lysis_L001_R1_001/NTC-lysis_L001_R1_001.txt
DemoData/NTC-lysis_L001_R2_001/NTC-lysis_L001_R2_001.txt
DemoData/NTC-lysis_L001_R1_001/NTC-lysis_L001_R1_001.fasta

Running the script

To run the script BaseNameFolderAssignment first make the script executable
chmod +x BaseNameFolderAssignment
Next run the script with the desired directory as input. This tutorial contains a demo dataset to play around with the script under the directory DemoData
./BaseNameFolderAssignment DemoData/

Comments

Popular posts from this blog

Qiime2 to Phyloseq

RpoB Processing

Time to Make the Switch