I was able to figure out which OTUs from the rep_set file were Curtobacterium:
searchfile = open("rep_set_tax_assignments.txt", "r")
- Searched taxonomic assignment file from S.Gibbons for "Microbacteriaceae" n=2713
searchfile = open("rep_set_tax_assignments.txt", "r")
for line in searchfile:
if "f__Microbacteriaceae" in line: print line
searchfile.close()
- Created a smaller fasta file by pulling out Microbacteriaceae sequences from giant 'rep_set.fna' file from S.Gibbons
from Bio import SeqIO
fasta_file = "rep_set.fna" #input fasta file
wanted_file = "microbacteriaceae-only.txt" #input interesting sequence IDs, one per line
result_file = "microbacteriaceae-only.fasta" #output fasta file
wanted = set()
with open(wanted_file) as f:
for line in f:
line = line.strip()
if line != "":
wanted.add(line)
fasta_sequences = SeqIO.parse(open(fasta_file),'fasta')
count = 0
with open(result_file, "w") as f:
for seq in fasta_sequences:
if seq.id in wanted:
count = count + 1
SeqIO.write([seq], f, "fasta")
print "Coverted %i records" % count
- QIIME - assign_taxonomy.py on new 'microbacteriaceae-only.fasta'
- Aligned with GreenGenes core set (same reference as GenBank protocol)
- Performed above procedure to generate 'curtobacterium-only.fasta' n=53
No comments:
Post a Comment