RiboSubstrates

June 2006


Content


1 Software description

1.1 Generalities

RiboSubstrates has been written in Perl using the freely available CPAN modules CGI::Application, HTML::Template and Class::DBI (see (http://www.perl.org/ and http://cpan.org/). Essentially, RiboSubstrates includes four modules and two configuration files (see Figure 1).

View larger image
(You may have to click on the new image to see it larger)
Figure 1: Process flow for Ribosubstrates.

1.2 cDNA database

The first module was developed for new cDNA database integration. To date, the databases from the human (NCBI version 35.1 at ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA), mouse (version 34.1 at ftp://ftp.ncbi.nih.gov/genomes/M_musculus/RNA), bacteria Escherichia coli (version 17 at http://bmb.med.miami.edu/EcoGene/EcoWeb/CESSPages/FILES/Sequences/EcoGene17. lib.gz) and Lactococcuslactis (sequence NC_002662.1 at http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=15671982), yeast Saccharomyces cerevisiae (at ftp://genomeftp. stanford.edu/pub/yeast/data_download/chromosomal_feature/saccharomyces_cerevisi ae.gff) and the parasite Leishmania major (version .3.6 at ftp://ftp.sanger.ac.uk/pub/databases/L.major_sequences/CHROMOSOMES/) were recovered from public libraries and integrated into RiboSubstrates using the described 5 module. In the case of L. lactis and Leishmania major, the mRNAs were extracted directly from the annotation of these genomes. Additional cDNA databases can easily be added on request; all that is required is a file in fasta format containing all of the entries. For a full Ribosubstrates integration, the header must be in the following format: >ID|Name|Description. The ID field must be the ID provided by external source for a direct link to the external source description. The description can be anything; the default database describes the gene function and location in genome. If the header is not in the above format, the search result will display the header without any interpretation.

1.3 Ribozyme descriptors

The second module interprets a configuration file provided to the application (RiboParticularities.xml). This file describes the signature to be searched for each ribozyme. For example, it will include information on the length of the recognition domain of a ribozyme; the specific nucleotide requirement either within, before or after the binding domain, etc. Position constraints, including Watson-Crick and Wobble pairing, can be also considered.

1.4 Substrates search

The third module is the substrate search and score calculation. Initially, the cDNA database is scanned for the substrate sequence signature generated from the user inputs and ribozyme constraints defined in the configuration file. Possible constraints include a maximum number of either Wobble base pairs or mismatches. This module produces a preliminary result file of potential substrates. A score is then calculated for each potential substrate. Each mismatch encountered increases the score by 100, and each Wobble base pair by 10. If the information for a spacer is specified in RiboParticularities.xml, then the tool score can be increased by the factors specified in the configuration file.

1.5 Substrates display

The fourth module is the substrates list and detailed display. The list display is generated using a second configuration file (RiboSubstParticularities.xml) that describes the substrates list display. Each substrate sequence has different constraints for color mapping based on nucleotide binding to ribozyme. The color mapping is based on position, mismatches and wobble. All tables are sorted as a function of the score, the lowest scores being the best matches. The first result table contains all substrates that are perfectly matched for the RNA tool in question (see below Figure 2). The second table contains all hits that have no mismatches, but that contain Wobble base-pairings. The last table contains hits with both mismatches and Wobble base pairs. The information displayed in each table includes the cDNA ID, the cDNA's name, the recognition sequence match and other information specific for each RNA tool. The ID for each hit is a link to a more detailed view of the potential substrate. This view presents essential information about the targeted substrates, including a description of the targeted mRNA. Depending on the interpretation of the header provided with the cDNA database, this may also include a link to the external source information (e.g. the NCBI detailed view of the mRNA). Other information displayed shows the position in the mRNA sequence targeted by the designed ribozyme. This position is mapped in color on the sequence. This module also permits the user to store substrates in an Excel file.

2 Using RiboSubstrates

2.1 SOFA-HDV example

View larger image
(You may have to click on the new image to see it larger)
Figure 2: Design of RiboSubstrates with the parameters for the SOFA-HDV ribozyme. (A) Secondary structure and nucleotide sequence of the SOFA-HDV ribozyme targeting a mRNA. The P1 stem of the ribozyme and the biosensor (BS) of the SOFA-HDV module are identified. The letter N indicates A, C, G or U residues. The arrow indicates the cleavage site. (B) The RiboSubstrates input with the signature to be searched for the SOFA-HDV targeting the HCV virus (SOFA-HDV- 135) and the table of substrate display. Mismatches nucleotides and Wobble base pairs are identified in red and blue, respectively. (C) The more detailed tables for both the mRNA coding the HCV genome and the predicted LOC440096, respectively.

2.1.1 SOFA-HDV theory

In order to illustrate the potential of this software , we used to define and test targets for SOFA-HDV ribozyme. The SOFA-HDV ribozymes are an improved generation of ribozymes that possess significant potential in both functional genomics and gene therapy (4,5). These ribozymes recognize their substrates through the formation of two stems: i. the P1 stem that is composed of one GU Wobble base pair followed by six consecutive Watson-Crick base pairs; and, ii. the biosensor composed by ten to fifteen Watson-Crick base pairs (BS; see Figure 2A).

2.1.2 SOFA-HDV ribozyme targeting the IRES

We investigated the specificity of a Ribosubstrate selected SOFA-HDV ribozyme targeting the internal ribosome entry site (IRES) of the hepatitis C virus (HCV) (6). Once a configuration file (RiboParticularitites.xml) was created, the ribozyme was automatically integrated in RiboSubstrates. This configuration file includes specific information such as the required presence of a guanosine residue at the 3'-end adjacent to the cleavage site so as to permit formation of the essential GU Wobble base pair. Moreover, the options of defining the spacer length as well as the distance between the P1 and the BS stems were included. The sequences of the P1 stem and biosensor (BS) were AUGGCUU and CGGUUCCGCAGA, respectively (SOFA-HDV-HCV-135). These sequences were verified to be unique to the HCV genome. The number of mismatches permitted in the BS was 3 and the spacer (i.e. the distance between the P1 and BS stems) length varies between 1 and 10 nucleotides. As this ribozyme is to be used in human cells, an updated version of the human cDNA database, including an entry for a HCV type 1b complete genome, was selected and the search performed. The RiboSubstParticularities.xml files for the score calculation had some specific indications for SOFA-HDV ribozyme. For example, a score was included for the spacer length according 8 to biochemical data published previously (4,6). A spacer length between 1 and 6 nucleotides has a spacer score of 0, both the lack of a spacer (i.e. the P1 and BS sequences on the substrate are contiguous) and a spacer length between 7 and 9 has a spacer score of 4. An unspecified length is automatically attributed a spacer score of 10.

2.1.3 SOFA-HDV ribozyme results analysis

The substrate display is illustrated in figure 2B (lower section). The designed ribozyme has only one perfect hit, i.e. the HCV genome. After this entry, the lowest score was 116 for a predicted mRNA (LOC440096) (Figure 2B and C). This hit exhibits one mismatch and one Wobble base pair with the SOFA-HDV-HCV. Biochemical experiments have shown that the presence of one mismatch causes a significant reduction of the cleavage activity of a SOFA-HDV (6). Therefore, this ribozyme should be specific for targeting the HCV IRES. When the RiboSubstrates experiment was repeated using the original human database (i.e. without adding the HCV sequence) no perfect hit was retrieved. More importantly, several experiments have been performed in test tube as well as in cultured cells (unpublished data, F. Brière and JP Perreault). The SOFA-HDV-HCV-135 exhibited efficient cleavage of the HCV IRES and all data indicated that this ribozyme has a specific action.

2.2 Hammerhead example

View larger image
(You may have to click on the new image to see it larger)
Figure 3: Design of RiboSubstrates with the parameters for the hammerhead ribozyme. (A) Secondary structure and nucleotide sequence of the hammerhead ribozyme targeting a mRNA. The two binding domains of the ribozyme are identified. The letter N indicates A, C, G or U residues. The arrow indicates the cleavage site. (B) The Ribosubstrates input with the signature to be searched for the hammerhead ribozyme targeting the human Bcl-2 mRNA (hhRz-Bcl-2-1). Mismatches nucleotides and Wobble base pairs are identified in red and blue, respectively. (C) The table substrate display retrieved for this ribozyme.

2.2.1 Hammerhead theory

A second example was developed for the hammerhead ribozyme (7). This ribozyme has a substrate recognition domain that includes the stems located on each side of the cleavage site (Figure 3A). Specific RiboParticularities.xml files describing the hammerhead signature to be searched for, and the substrate display were appended to a previously written configuration file.

2.2.2 Hammerhead ribozyme targeting the human apoptosis-associated Bcl2 mRNA

We investigated the specificity of a Ribosubstrate selected Hammerhead ribozyme targeting the human apoptosis-associated B-cell CLL/lymphoma 2 (Bcl2) mRNA (Figure 3B; see Ref. 8). Once a configuration file (RiboParticularitites.xml) was created, the ribozyme was automatically integrated in RiboSubstrates. The sequences of the stem I and III were GGUCAGGUT and CCACAGGGGU respectively. These sequences were verified to be unique to the Bcl2 mRNA. The number of mismatches permitted in the stems were set to 0 missmatch.

2.2.3 Hammerhead ribozyme results analysis

The results display shows that, in addition to targeting the Bcl2 mRNA with a perfect score, this ribozyme would also have the potential to target both another variant of Bcl2 (score of 10) and the G protein-coupled receptor 97 (score of 50). Consequently, we would suggest the design another ribozyme that would be more specific to the Bcl2 mRNA.

3 Databases Notes

RiboSubstrates database ordered by computing time (fastest to slowest):

Submix: Small database generated inHouse for test purpose
Lactococcus lactis version NC_002662.1: Lactococcus lactis subsp. lactis Il1403, complete genome downloaded at http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nucleotide&val=15671982
Ecogene ecoli k12 strain june 2003: Version 17 of the E. coli strain K12 complete genome downloaded at http://bmb.med.miami.edu/EcoGene/EcoWeb/CESSPages/FILES/Sequences/EcoGene17.lib.gz
SGD sacc. cerevisiae rnas Oct 2005: The yeast S. cerevisae downloaded at ftp://genomeftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/saccharomyces_cerevisi ae.gff
Mouse NCBI build 34.1 mRNAs: Mus musculus complete genome build 34.1 downloaded at ftp://ftp.ncbi.nih.gov/genbank/genomes/M_musculus
Human NCBI build 35.1 mRNAs: Homo sapiens complete genome build 35.1 downloaded at ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA
Human NCBI build 35.1 mRNAs + HCV type 1b (+ / -) complete genome: Homo sapiens compelte genome build 35.1 downloaded at ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA + one entry with the complete genome of the HCV type 1b positive and negative strand
Human aceview build Aug05 transcripts: Homo sapiens complete genome provided by Aceview's annotation based on the NCBI build 35.1 downloaded at http://www.ncbi.nlm.nih.gov/IEB/Research/Acembly/downloads.v47.html
Human NCBI build 36.2 mRNAs: Homo sapiens complete genome build 36.2 downloaded at ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA
Human NCBI build 36.2 mRNAs + HCV(+/-) genotypes 1 to 6 complete genome: Homo sapiens compelte genome build 36.2 downloaded at ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA + some entries with the complete genome of the HCV type 1 to 6 positive and negative strand
Human NCBI build 36.2 mRNAs + HCV(+/-) gt 1 to 6 + Influenza type A(+/-): Homo sapiens compelte genome build 36.2 downloaded at ftp://ftp.ncbi.nih.gov/genomes/H_sapiens/RNA + some entries with the complete genome of the HCV type 1 to 6 positive and negative strand + one entry of the Influenza type A positive and negative strand