Finally was able to get the field experiment setup and deployed in time for the California 'wet' season. Hopefully this elevational study has some interesting things to tell me! Just have to wait and see...
Wednesday, October 21, 2015
Saturday, July 18, 2015
Microorganisms!
I will just preface this by saying, I am not a mycologist.
With that said, I am a microbiologist, characterizing the mechanisms that drive bacterial diversity. Now, studying microbial ecology is a bit different than traditional ecology. You do not have the wonderful experience to capture your study organism in the field, or see a glorified mating ritual. What you do get, is the excitement of seeing a successful next-gen PCR run come back clean, ready to be sequenced. Or the fascination of processing terabytes of sequence data for down-stream analysis. Further, the fieldwork, at least for me, is almost nonexistent. Now, don't get me wrong - I live for this field. I love processing data and find that how microorganisms dictate almost all broad-scale ecological processes to be utterly fascinating. To this day, I am blown away how large of an impact these litter bugs can have.
So, when I first ventured into the field down here in Costa Rica, I was expecting to see insects, the coolest frogs ever, and, most excitedly, large mammals. To my surprise, I did happen to see a lot of insects, however, they were not the most welcomed ones - mosquitos! However, in all seriousness, these giant primary tropical forests almost appear desolate. You have this amazing biodiversity of plant species, but, to the untrained eye, there is really nothing else to see. You rarely see a howler monkey grace you with its presence in the tree canopy, nor do you ever see the elaborate and beautiful snakes that are indicative of the tropics. But, as you look more closely, an entire world begins to unfold.
At the smaller scale, there is an entire world to be seen. You have colonies of leaf-cutter ants marching through the forest, harvesting leaves for its fungal garden. There are dung beetles carving out its delicious meal, ever ready to present its glorified treasure to a mate. And as you continue to examine this world, you come across one of the most interesting branches of life. Fungi. This clade is responsible for the vast portion of decomposition in terrestrial systems. Without fungi operating at the level that they do, life would simply not be the same in the tropics. The soil here is very nutrient poor, and most of the usable nutrients are locked in living biomass. Because of this, the turnover rate by fungi (and of course bacteria!) to recycle these nutrients is essential. I told you it's cool that microorganisms dictate everything!
Of course, this information is not new, nor was it to me before I set foot in Costa Rica. However, what I didn't expect to see was the morphological diversity that was on full display in all its glory. I would have never thought that a decomposing log in the middle of the forest would captivate and demand my attention. But these logs in particular, are the playground for these eukaryotes. There are clonal colonies of these fungi creating a vast and integrative network of mycelium, culminating in the production of these beautiful mushrooms. Beyond the decomposing logs, you see mushrooms sprouting up aboveground, evidence of the potential for these organisms to grow beyond belief. What few people seem to realize is that most fungi form these mycelium mats, creating giant organisms of massive size with the ability to become the largest organism in the world!
Fungi extend beyond the limitation of decomposition; and can have potential detrimental and pathogenic effects to all forms of life. Most interestingly, is the story of the zombie fungus (Ophiocordyceps unilateralis), as described by Alfred Wallace in the 1800s. I had the privilege to witness the effects in all its glory on a trip to La Selva. This fungi infects social insects (in my case, the mighty bullet ant (Paraponera clavata)) by using enzymes that have been deposited within the fungal spores to breakdown the armor that is the exoskeleton. Next, the fungal spread within the insect causes a truely unique and horrifying effect. Inevitably, the insect becomes a puppet, fully manipulated by the fungal pathogen as it reprograms the ant's entire social behavior. The obediant and systematic social insect that has developed over eons of evolutionary time is disrupted within just a few days. The ant leaves its nest or foraging trail, abadoning its family, to find a suitable habitat for its newfound master. The ant then climbs onto a stem and secures its place on the underside of a leaf, using its giant mandibles to fixate its location. It is here that the fungal pathogen shuts down the ant altogether, muscles atrophy and the infamous fungal 'death grip' is in full effect. The mighty ant, who is capable of lifting thousands of times its own body weight, is left helpless and paralyzed on what will eventually be its final resting place. The hyphae continue to spread throughout the ant, eventually killing its host who, at this point, has served its full purpose. Eventually, fruiting bodies grow out of the head of the ant, releasing spores from this advantageous position high up on the leaf of this plant. These spores disperse and are ready to fall onto the next unsuspecting ant brigade, starting this fascinating process once again.
In conclusion, microorganisms are awesome! The more you learn, the more convinced you will become, I guarantee it. Since I cannot take photos of bacteria (which equally have a number of amazing stories), I settled for fungi. I could not stop from taking photos of the vast diversity of fruiting bodies - some smaller than a pencil point, while others were as large as a person. Below you will find some of my favorites, most of which I have zero idea what they are (all input will be much appreciated!). To conclude, I am not a mycologist, but that doesn't stop me from enjoying and appreciating the unique stories and beauty each of these little guys has to offer! PURA VIDA
Wednesday, July 8, 2015
La Selva
This summer, I ventured down to Costa Rica to participate in the OTS Tropical Biology course. It has been one of the best experiences of my life. But, in particular, I wanted to share a project that a group of us put together. Now, this wasn't a typical project for this course, which usually consists of a week to plan, execute, present, and write up a research project. This project was geared towards science outreach. With the help of some amazing visiting scientists, filmmakers, and producers, we had the privilege to create a short film in 3 days! Our group decided to concentrate on what makes La Selva Biological Research Station so important. Enjoy!
untitled (La Selva)
Special thanks to Nathan Dappan at Day's Edge Production, Sarah Joseph at National Geographic, and Alex Wild
untitled (La Selva)
Special thanks to Nathan Dappan at Day's Edge Production, Sarah Joseph at National Geographic, and Alex Wild
Here I am with Michel Alejandro (Univ. of Puerto Rico) sitting atop the canopy tower at La Selva Biological Station |
Tuesday, March 24, 2015
EMP - Matrix and OTU table
Need to analyze EMP metadata:
Raw data - EMP_10k_merged_mapping_final.txt and full_emp_table_w_tax.biom
I was able to pull out Curto OTUs last week (see post from 3/12/15) from the full_emp.biom and convert to .txt file to be able to manipulate further.
Giant EMP_10k file has 14095 samples. The Curto OTU table only has 2882 samples.
Took the list of samples that appear in Curto OTU table and made a list. Wrote a rough code sample-ids-curto.py to parse EMP_10k file and extract only samples from Curto OTU table.
Creates a new file - curto-samples.csv
$ wc -l curto-samples.csv
2490
This means there were roughly 390 samples missing. So either code has a bug OR samples NOT in the EMP_10k file. Turns out, they are not in the EMP_10k file (no idea why?).
$vimdiff file1 file2
Samples not included in further analysis found in samples-not-in-csv.txt
Need to redefine Curto OTU table - eliminate samples that are not found in the EMP_10k file.
Modified previous code slightly to parse Curto OTU Table and pull out correct samples.
*First had to transpose OTU table to get in correct format - code parses the first string in each row
OLD FORMAT
Sample1 Sample2 Samplen
OTU1
OTU2
OTUn
Creates a new file - full-emp-curto-only-with-found-samples.csv
Compare number of samples to check: both files have 2490 samples
Sort both curto-samples.csv and full-emp-curto-only-with-found-samples.csv
$ sort curto-samples.csv curto-samples-sorted.csv
#and for other file
Combine two files and check to make sure sample IDs match-up *they should since they were sorted
Creates a new file - combined-samples-otu-table.csv
NEW FORMAT
OTU1 OTU2 OTUn … METADATA
Sample1
Sample2
Samplen
Eliminate all columns in Metadata that contain "na" or "None" for every sample
--> 205 columns were eliminated
--> GRAND TOTAL = combined-samples-otu-table-annotated.xlsx
53 OTUs with 2489 samples with 271 columns of Metadata!
Raw data - EMP_10k_merged_mapping_final.txt and full_emp_table_w_tax.biom
I was able to pull out Curto OTUs last week (see post from 3/12/15) from the full_emp.biom and convert to .txt file to be able to manipulate further.
Giant EMP_10k file has 14095 samples. The Curto OTU table only has 2882 samples.
Took the list of samples that appear in Curto OTU table and made a list. Wrote a rough code sample-ids-curto.py to parse EMP_10k file and extract only samples from Curto OTU table.
Creates a new file - curto-samples.csv
$ wc -l curto-samples.csv
2490
This means there were roughly 390 samples missing. So either code has a bug OR samples NOT in the EMP_10k file. Turns out, they are not in the EMP_10k file (no idea why?).
$vimdiff file1 file2
Samples not included in further analysis found in samples-not-in-csv.txt
Need to redefine Curto OTU table - eliminate samples that are not found in the EMP_10k file.
Modified previous code slightly to parse Curto OTU Table and pull out correct samples.
*First had to transpose OTU table to get in correct format - code parses the first string in each row
OLD FORMAT
Sample1 Sample2 Samplen
OTU1
OTU2
OTUn
Creates a new file - full-emp-curto-only-with-found-samples.csv
Compare number of samples to check: both files have 2490 samples
Sort both curto-samples.csv and full-emp-curto-only-with-found-samples.csv
$ sort curto-samples.csv curto-samples-sorted.csv
#and for other file
Combine two files and check to make sure sample IDs match-up *they should since they were sorted
Creates a new file - combined-samples-otu-table.csv
NEW FORMAT
OTU1 OTU2 OTUn … METADATA
Sample1
Sample2
Samplen
Eliminate all columns in Metadata that contain "na" or "None" for every sample
--> 205 columns were eliminated
--> GRAND TOTAL = combined-samples-otu-table-annotated.xlsx
53 OTUs with 2489 samples with 271 columns of Metadata!
Tuesday, March 17, 2015
EMP, GreenGenes - Make Local DB and BLAST
Create a reference database from my GreenGenes + 16S strains
I used the rep_seqs that were generated when I created my phyla tree as my database.
Made a new file - curto-db.fasta
*these are aligned rep_set seqs
Two ways to create your own local database:
1. Use the BLAST command line
The sequences need to be in a specific format:
Ex.
>gnl|831711|Microbacteriaceae_Candidatus_Rhodoluna
DNA here
makeblastdb
$ makeblastdb -in curto-db.fasta -dbtype nucl -out curto.db
Find out more details HERE
2. Use Geneious
Tools -> Sequence Search
Window pops up and click "Add/Remove Databases" - select "Add Sequence Database"
Follow instructions (ie. select 'nucleotide' and 'custom BLAST')
Perform Sequence Search again, but this time Select "Database" and scroll to your new custom database!
___________________________________________________________________________
Next, BLAST the EMP seqs against my local database.
*The EMP seqs were generated from QIIME assign_taxonomy.py and took those who identified with Curtobacterium with greater 0.67 quality score
*The seqs are also extremely short - less than 200 bp
Export the data to a .txt file
I used the rep_seqs that were generated when I created my phyla tree as my database.
Made a new file - curto-db.fasta
*these are aligned rep_set seqs
Two ways to create your own local database:
1. Use the BLAST command line
The sequences need to be in a specific format:
Ex.
>gnl|831711|Microbacteriaceae_Candidatus_Rhodoluna
DNA here
makeblastdb
$ makeblastdb -in curto-db.fasta -dbtype nucl -out curto.db
Find out more details HERE
2. Use Geneious
Tools -> Sequence Search
Window pops up and click "Add/Remove Databases" - select "Add Sequence Database"
Follow instructions (ie. select 'nucleotide' and 'custom BLAST')
Perform Sequence Search again, but this time Select "Database" and scroll to your new custom database!
___________________________________________________________________________
Next, BLAST the EMP seqs against my local database.
*The EMP seqs were generated from QIIME assign_taxonomy.py and took those who identified with Curtobacterium with greater 0.67 quality score
*The seqs are also extremely short - less than 200 bp
Export the data to a .txt file
Really strange results - EMP seqs hit rep_seqs at equal frequency
Need to look at seqs in Geneious and check alignments!
Thursday, March 12, 2015
EMP - OTU Table
FINALLY FIGURED OUT HOW TO GET OTU TABLE!
$ biom subset-table -i full_emp_table_hdf5.h5 -a observation -s curto-only-ids.txt -o full_emp_table_curto.biom
$ biom convert -i full_emp_table_curto.biom -o full_emp_table_curto.txt --to_tsv --header-key taxonomy
- Remember the EMP Open .biom file was too large (too much memory - crashed Python)
- Converted format to HDF5 file for easier manipulation
- Found this convenient python class
- Which then enables (if biom is installed...) and only if hdf5 file is in correct format
$ biom subset-table -i full_emp_table_hdf5.h5 -a observation -s curto-only-ids.txt -o full_emp_table_curto.biom
$ biom convert -i full_emp_table_curto.biom -o full_emp_table_curto.txt --to_tsv --header-key taxonomy
Monday, March 9, 2015
GreenGenes - Pipeline and Phylogenies
1. Download the entire GreenGenes database
Need:
gg_13_5.fasta
gg_13_5_taxonomy.txt
2. Search for taxonomy of interest - start with Microbacteriaceae
#creates a text file with IDs matching search
$ egrep "f__Microbacteriaceae" gg_13_5_taxonomy.txt | awk '{print $1}' > ./gg-microbacteriaceae.txt
3. micro-only.py
#searches fasta file and creates a new fasta file with only IDs from gg-microbacteriaceae.txt
#found 5707 sequences
4. Combine my 16S reads
$ cat my-16S-reads.fasta gg-micro.fasta > output.fasta
5. QIIME - pick_otus.py - generates 327 OTUs
-m uclust
-s 0.97
-A #optimal search
***Swarm loses OTUs when running due to its algorithm
6. QIIME - pick_rep_set.py
-f gg-all-microbacteriaceae-with-16S.fasta
-r my-16S-reads.fasta
-m longest
6B. fasta-rename.py #renames all seqs with new names on fasta header
7. Align rep_sest sequences with SINA
8. Eliminate all OTUs with <20 seqs EXCEPT for Curtobacterium OTUs (also did <50 seqs)
9. JModel Test - Computes likelihood scores with PHYML
Base Frequencies +F
Rate Variation +I +G nCat=4
ML Optimized
Base Tree Search = NNI
Best Models:
Models BIC Calculation
TlM1 + G 27589
TrN + G 27593
GTR + G 27608
10. Run TrN+G model on MEGA
-Maximum Likelihood
-Nucleotide Substitution = TrN
-Bootstrap Method = 100
-Gamma Distributed = 5
-Complete Deletion
-NNI
11. Run GTR+G on RAxML - see RAxML manual for help
$ raxmlHPC -s input_file.phy -n output_name -m GTRGAMMA -# 100 -x 100 -p 2389 -f a -o outgroup_name
Need:
gg_13_5.fasta
gg_13_5_taxonomy.txt
2. Search for taxonomy of interest - start with Microbacteriaceae
#creates a text file with IDs matching search
$ egrep "f__Microbacteriaceae" gg_13_5_taxonomy.txt | awk '{print $1}' > ./gg-microbacteriaceae.txt
3. micro-only.py
#searches fasta file and creates a new fasta file with only IDs from gg-microbacteriaceae.txt
#found 5707 sequences
4. Combine my 16S reads
$ cat my-16S-reads.fasta gg-micro.fasta > output.fasta
5. QIIME - pick_otus.py - generates 327 OTUs
-m uclust
-s 0.97
-A #optimal search
***Swarm loses OTUs when running due to its algorithm
6. QIIME - pick_rep_set.py
-f gg-all-microbacteriaceae-with-16S.fasta
-r my-16S-reads.fasta
-m longest
6B. fasta-rename.py #renames all seqs with new names on fasta header
7. Align rep_sest sequences with SINA
8. Eliminate all OTUs with <20 seqs EXCEPT for Curtobacterium OTUs (also did <50 seqs)
9. JModel Test - Computes likelihood scores with PHYML
Base Frequencies +F
Rate Variation +I +G nCat=4
ML Optimized
Base Tree Search = NNI
Best Models:
Models BIC Calculation
TlM1 + G 27589
TrN + G 27593
GTR + G 27608
10. Run TrN+G model on MEGA
-Maximum Likelihood
-Nucleotide Substitution = TrN
-Bootstrap Method = 100
-Gamma Distributed = 5
-Complete Deletion
-NNI
11. Run GTR+G on RAxML - see RAxML manual for help
$ raxmlHPC -s input_file.phy -n output_name -m GTRGAMMA -# 100 -x 100 -p 2389 -f a -o outgroup_name
MEGA - TrN + G with OTUs > 50 seqs
Tuesday, March 3, 2015
GreenGenes - Phylogenetics Background
Been working on this for a few weeks, but I'll summarize:
Brief overview of Phylogenetics:
Multiple Sequence Alignment
generates a score between pairs of sequences
MUSCLE - multiple alignment software includes distance estimations using Kmer
Clustalw - takes a set of input sequences and carry out progressive alignment
--> aligned in pairs in order to generate a distance matrix
--> uses a Neighbor-Joining method to produced unrooted tree which serves as the guide for multiple alignment
INPUT DATA METHOD
2 - 100 protein seqs MUSCLE
100 - 500 seqs globally aligned
> 500 seqs
small number of large seqs Clustalw
Genetic Distance and Nucleotide Substitution Models
Genetic Distance - evolutionary distance
Rate Heterogeneity among sites - rate of nucleotide substitution can vary substantially for different positions
--> Use Gamma Distribution - expectation 1.0 with variance 1/alpha
Phylogenetic Inference based on Distance Methods
Try to fit a tree to a matrix of genetic distances
Minimum Evolution (ME) - distance method for constructing additive trees to minimize length of tree
Neighbor-Joining - minimizes steps by finding a pair of neighboring OTUs
Phylogenetic Inference using Maximum Likelihood (ML) Methods
Highest probability of observed data under a set of parameters
Determines tree topology, branch lengths, and parameters of evolutionary model that maximizes the probability of observing the sequences in a particular arrangement
--> GOAL - to find tree among all possible tree structures that maximizes the global likelihood
However, impossible to compute all possible trees -> need to add heuristics
1. Stepwise Addition
2. Star Decomposition
3. Neighbor-Joining
PHYML - fast distance based method to quickly compute a full initial tree
RAxML - builds tree on maximum parsimony and optimizes with a variant of sub-tree
Uses Lazy Subtree Arrangement (LSR) - assigns maximal distance between pruning and insertion point for Subtree prune and regraft (SPR) operations to restrict size of neighborhood
Optimizes only the branch that originates at the pruning point
Repeats using the current best tree
Takes the 20 best trees found during LSR to reoptimize ML by adjusting branch lengths
Branch Support - all methods produce a single tree and ML values
Bootstrapping:
1. Pseudo-samples are created by randomly drawing with replacement l columns from the original l column alignment
2. From each pseudo-sample, a tree is reconstructed and a consensus tree is made
Consensus Tree - incorporates branches that occur in the majority of trees
Bootstrap Values used as an indicator for reliability of branches
Brief overview of Phylogenetics:
Multiple Sequence Alignment
generates a score between pairs of sequences
MUSCLE - multiple alignment software includes distance estimations using Kmer
Clustalw - takes a set of input sequences and carry out progressive alignment
--> aligned in pairs in order to generate a distance matrix
--> uses a Neighbor-Joining method to produced unrooted tree which serves as the guide for multiple alignment
INPUT DATA METHOD
2 - 100 protein seqs MUSCLE
100 - 500 seqs globally aligned
> 500 seqs
small number of large seqs Clustalw
Genetic Distance and Nucleotide Substitution Models
Genetic Distance - evolutionary distance
Rate Heterogeneity among sites - rate of nucleotide substitution can vary substantially for different positions
--> Use Gamma Distribution - expectation 1.0 with variance 1/alpha
Phylogenetic Inference based on Distance Methods
Try to fit a tree to a matrix of genetic distances
Minimum Evolution (ME) - distance method for constructing additive trees to minimize length of tree
Neighbor-Joining - minimizes steps by finding a pair of neighboring OTUs
Phylogenetic Inference using Maximum Likelihood (ML) Methods
Highest probability of observed data under a set of parameters
Determines tree topology, branch lengths, and parameters of evolutionary model that maximizes the probability of observing the sequences in a particular arrangement
--> GOAL - to find tree among all possible tree structures that maximizes the global likelihood
However, impossible to compute all possible trees -> need to add heuristics
1. Stepwise Addition
2. Star Decomposition
3. Neighbor-Joining
PHYML - fast distance based method to quickly compute a full initial tree
RAxML - builds tree on maximum parsimony and optimizes with a variant of sub-tree
Uses Lazy Subtree Arrangement (LSR) - assigns maximal distance between pruning and insertion point for Subtree prune and regraft (SPR) operations to restrict size of neighborhood
Optimizes only the branch that originates at the pruning point
Repeats using the current best tree
Takes the 20 best trees found during LSR to reoptimize ML by adjusting branch lengths
Branch Support - all methods produce a single tree and ML values
Bootstrapping:
1. Pseudo-samples are created by randomly drawing with replacement l columns from the original l column alignment
2. From each pseudo-sample, a tree is reconstructed and a consensus tree is made
Consensus Tree - incorporates branches that occur in the majority of trees
Bootstrap Values used as an indicator for reliability of branches
Thursday, February 26, 2015
BACE - DNA combination and Ship
Some samples still have poor yields, so combine samples and reconcentrate
Followed protocol for Amicon Pro Purification System:
MCBA15 004 - combine 4.1 and 4.2 from 2/25
MCBA15 007 - combine from both extraction days
MCBA15 015 - combine from both extraction days
MCBA15 017 - combine from both extraction days
MCBA15 019 - combine 19.1 and 19.2 from 2/25
MMLR15 020 - combine from both extraction days
Quantified with Qubit on BioTek at 485/530 nM
Sample ID Concentration (ng/uL) Volume (uL)
MCBA15 004 20.8 95
MCBA15 007 7.7 90
MCBA15 015 34.9 100
MCBA15 017* 81.0 100
MCBA15 019 11.0 90
MMLR15 020 4.9 95
*split in two
Shipped out sample on 3/3/15 (due to weather):
Sample ID Total DNA (ng)
MCBA15 004 1976.0
MCBA15 007 693.0
MMLR15 010* 672.0
MMLR15 011* 616.0
MCBA15 015 3490.0
MCBA15 017 3645.0
MCBA15 019 990.0
Samples delivered and received 3/4/15 - email from Michael
_____________________________________
Samples not sent out due to poor yields:
MMLR15 018 - slow growing
MCBA15 021 - Frigoribacterium; did not redo
MMLR15 022 - Frigoribacterium; did not redo
Tuesday, February 24, 2015
BACE - DNA Extraction Pt II
Try and extract DNA from samples that I could not get enough DNA from on 2/17
***for samples with really poor yields from last time, extracted two sets
Need more Lysozyme - 10 mg/mL in 60 uL x 20 samples
TEN Buffer:
40 mM Tris-HCl ph=7.5
1 mM EDTA ph=8.0
150 mM NaCl
Stock Solutions:
400 mM Tris-HCl = 6.30 g in 100 mL dH2O
100 mM EDTA = 2.92 g in 100 mL dH2O
300 mM NaCl = 1.75 g in 100 mL dH2O
--> 1500 uL TEN Buffer = 750 uL NaCl + 15 uL EDTA + 150 uL Tris-HCl + 585 uL ddH2O
Add 1 mL TEN Buffer + 10 mg Lysozyme = 10 mg/mL
Followed Promega Wizard DNA Purification Kit Protocol for gram-positive bacteria
EXCEPT:
Added 2 mL of liquid grown culture
Added 10 mg/mL of 60 uL + 60 uL ddH2O = 120 uL
Added 60 uL of Rehydration Solution
Quantified with Qubit kit on BioTek at 485/530 nM
Sample ID Concentration (ng/uL)
MCBA15 004.1 7.9
MCBA15 004.2 20.8
MCBA15 007 4.9
MMLR15 010 0.0 - probably lost pellet
MCBA15 015 10.7
MCBA15 017.1 8.7
MCBA15 017.2 0.3
MMLR15 018.1 1.0 - grows slow, not much input
MMLR15 018.2 3.1 - grows slow, not much input
MCBA15 019.1 7.5
MCBA15 019.2 13.5
MMLR15 020 16.6
Tuesday, February 17, 2015
BACE - DNA Extractions and Shipment
Shipped out samples to MIT - Martin Polz and Michael Cutler
Curtobacterium samples (n=11) sent on 2/17/15:
Sample ID Total DNA (ng)
MCBA15 001 981.0
MMLR15 002 1254.2
MCBA15 003 893.8
MCBA15 005 953.9
MMLR15 006 943.6
MCBA15 008 1657.8
MCBA15 009 922.6
MCBA15 012 962.3
MCBA15 013 1240.4
MMLR15 014 1153.8
MCBA15 016 756.7
Samples Received on 2/19/15
Curtobacterium samples (n=11) sent on 2/17/15:
Sample ID Total DNA (ng)
MCBA15 001 981.0
MMLR15 002 1254.2
MCBA15 003 893.8
MCBA15 005 953.9
MMLR15 006 943.6
MCBA15 008 1657.8
MCBA15 009 922.6
MCBA15 012 962.3
MCBA15 013 1240.4
MMLR15 014 1153.8
MCBA15 016 756.7
Samples Received on 2/19/15
Thursday, February 12, 2015
BACE - DNA Extraction
Followed Promega DNA Extraction Kit Protocol
Results were better than Spin Column method, but still not great for some samples.
Results were better than Spin Column method, but still not great for some samples.
Wednesday, January 28, 2015
EMP - Align OTUs
Took the rep set of sequences and pulled out Curtobacterium OTUs only (Curtobacterium were assigned by GreenGenes database)
Aligned curto only sequences with SINA
Sequences are really short (~150 bp) - see how they incorporate into sequenced data from BACE litter (align with all sequences that were a hit for Microbacteriaceae)
BioCluster
Gained access to the BioCluster
Login: ssh username@hpc.oit.uci.edu
Help: cat /data/help/cheat-sheet.txt
Guidebook on how to use the BioCluster created by Kevin Thornton:
Example to run jobs on the cluster:
Batch jobs are jobs that contain all of necessary information and instructions to run inside a script. You create a script with your favorite editor (like emacs) and then submit the script to the scheduler to run.
Some jobs can run for days, weeks, or longer so batch is the way to go for such work. Once you submit a job to the scheduler, you can log off and come back at a later time and check on the results.
Login: ssh username@hpc.oit.uci.edu
Help: cat /data/help/cheat-sheet.txt
Guidebook on how to use the BioCluster created by Kevin Thornton:
Example to run jobs on the cluster:
Batch jobs are jobs that contain all of necessary information and instructions to run inside a script. You create a script with your favorite editor (like emacs) and then submit the script to the scheduler to run.
Some jobs can run for days, weeks, or longer so batch is the way to go for such work. Once you submit a job to the scheduler, you can log off and come back at a later time and check on the results.
Serial batch jobs are usually the simplest to use. Serial jobs run with only one core and are also the slowest since they only consume 1-core per job.
Consider the following serial job script available from the HPC demo account.
- cat ~demo/serial.sh
#!/bin/bash #$ -N TEST #$ -q free64 #$ -m beas date > out
Grid Engine Directive | What It Does |
---|---|
#!/bin/bash
|
Running shell to use ( the bash shell )
|
#$ -N TEST
|
Our Job Name is TEST. If output is produced to standard out, you will see a file name TEST.o<jobid> and TEST.e<jobid> for errors (if any occurred)
|
#$ -q free64
|
Request the free64 queue
|
#$ -m beas
|
Send you email of job status (b)egin, (e)rror, (a)bort, (s)suspend
|
The first line #!/bin/bash is the shell to use. Grid Engine (GE) directives start with #$. GE directives are needed in order to tell the scheduler what queue to use, how many cores to use, whether to send email or not, etc.
The last line in our serial.sh script is the program to run. In this example it is a simple date program writing the output to out file.
date > out
Now that we have a basic understanding let’s run our first serial batch job on the HPC Cluster. First create a test directory, change to the test directory, copy the demo serial.sh script to our new directory and submit the job.
From your HPC account, do the following:
$ mkdir serial-test $ cd serial-test $ cp ~demo/serial.sh . $ qsub serial.sh $ qstat -u $USER
After you submit the job (qsub), GE will respond with a job ID:
Your job 1961 ("TEST") has been submitted
and qstat will display something similar to this:
job-ID prior name user state submit/start queue slots 1961 0.00000 TEST jfarran qw 08/16/2012 1
The state of our job is qw queue wait (meaning the job is sitting in the queue waiting for a compute node). The core count (slots) shows as 1 (this is the default which is one core).
When we run qstat -u $USER again a few seconds later, we see:
job-ID prior name user state submit/start queue slots 1961 0.50659 TEST jfarran r 08/16/2012 free64@compute-7-11 1
The scheduler found compute-7-11 on free64 queue available with 1 core (slots) and started our job #1961 on it. The job state changed from queue wait qw to running r.
Once you submit your job (qsub), things happen rather quickly so you may need to type qstat repeatedly and fast to see your job. Or open a new window and run: watch -d "qstat -u $USER" |
Once the job completes you will get an email notification and the qstat output will be empty.
Now do an ls and you will see the following files:
out serial.sh
The serial.sh is the batch job we submitted and file out is the output from the date program. To see the output type:
$ cat out
Monday, January 26, 2015
BACE, Curto - Phylogeny
Used ARB-Silva SINA to align sequences and ARB to generate phylogeny
***May take out sequences (see below post; specifically AB_3.17L, AB_3.19L, AB_3.27L) that do not have full 16S gene - do not fit great into phylogeny
***May take out sequences (see below post; specifically AB_3.17L, AB_3.19L, AB_3.27L) that do not have full 16S gene - do not fit great into phylogeny
EDIT: new phylogeny with scale generated by ARB tree generator
Wednesday, January 21, 2015
BACE, Curto - Sequence Data
Received sequence data from Beckman Genomics Institute
Trimmed sequences:
Eliminated below 5% probability and trimmed first 20 bp (length of primers)
Primers used:
Forward Primer
Reverse Primer
Trimmed sequences:
Eliminated below 5% probability and trimmed first 20 bp (length of primers)
Primers used:
Forward Primer
AGAGTTTGATCCTGGCTCAG |
AAGGAGGTGATCCAGCCGCA |
Assembled de novo:
Samples with No contig overlap - could not assemble (n = 12) - used either F or R strand for id
AB 3.02L - gel shows blurry line at 1500 bp
AB 3.04L - multiple bands
AB 3.05L - bright band at 1500 bp
AB 3.12L - bright, smeared band at 1500 bp
AB 3.17L - multiple bands ~1500 bp
AB 3.19L - very faint band at 1500 bp
AB 3.27L - no visible band
AB 3.37L - bright, smeared band at 1500 bp
*AB 3.04B - multiple bands
*AB 3.09B - multiple bands
AB 3.13B - bright band at 1500 bp
*Curto 145 (redo) - no visible band
*low quality read percentage - did not include
Blast samples - blast against nr/nt database
Subscribe to:
Posts (Atom)