Automate RestrictionMapper

Introduction

RemoteRestMap is a Perl module that queries RestrictionMapper and parses the resulting reports. Using RemoteRestMap, you can write Perl scripts to analyze multiple sequences and compare the results, faster than by hand.

Download

Download RemoteRestMap here

Installation

Unix/Linux/OS X:

Unpack the tarball and cd to the new directory.
Type perl Makefile.PL
Type make
Type make test
Type make install

Windows:

Pretty much the same as above, except unzip the zip file and use nmake instead of make. If you don't have nmake, copy RemoteRestMap.pm, the RemoteRestMap folder and test.pl into perl\site\lib and run test.pl from there.

Requirements

RemoteRestMap requires the LWP::UserAgent (ver. 2.0 and up) and HTML::TreeBuilder (ver. 3.0 and up) modules. You also need Perl 5.6.0 or higher.

Usage

Start by creating a new RemoteRestMap object. The constructor needs a URL for sitefind3.pl and a DNA sequence.

use RemoteRestMap; my $url = 'www.restrictionmapper.org/sitefind3.pl'; my $dna = 'aacgagcctttaaggcgcttaaatcaagattccattaggcc'; my $request = RemoteRestMap->new($url, $dna);

Both the URL and sequence are get/set attributes. They can be accessed as follows:

     $request->url($new_url);
     $request->sequence($new_sequence);

Now you can use the get_map method to get a restriction map of your sequence. get_map takes an optional reference to a hash of selection parameters.

   my %settings = (
      DNAtype       => "circular",
      first         => "site_length",
      second        => "frequency",
      third         => "name",
      enzymetype    => "NEB",
      maxcuts       => "all",
      minlength     => 6,
      isoschizomers => "yes",
      overhang      => ["five_prime", "blunt"],
      enzymelist    => ['BamHI', 'EcoRI']
   );

my $restriction_map = $request->get_map(\%settings); #returns map object

Note that the selection parameters are the same as on the front page of RestrictionMapper. Each parameter is optional; the RestrictionMapper defaults will be applied for missing ones.

Parameters are:

DNAtype      - "linear" or "circular".
   Specifies DNA conformation.
   Default is "linear".

first        - "frequency", "name", "overhang" or "site_length".
   Specifies column for first order sorting of output table.
   Default is "frequency".

second       - "frequency", "name", "overhang" or "site_length".
   Specifies column for second order sorting of output table.
   Default is "overhang".

third        - "frequency", "name", "overhang" or "site_length".
   Specifies column for third order sorting of output table.
   Default is "name".

enzymetype   - "NEB" or "all".
   Specifies whether all commercial enzymes or only New England Biolabs supplied
   enzymes will be returned.
   Default is "all".

maxcuts      - "all", "0", "1", "2", "3", "4", "5", "10", "20", "30" or "40".
   Specifies maximum number of cuts per enzyme to return. Enzymes that cut the
   sequence more than this number will not appear in the table. Be sure to
   use string notation, even for numeric values (sorry about that). Use "0"
   to return noncutters only.
   Default is "all".

minlength     - 4, 5, 6, 7 or 8.
   Specifies the minimum length of the recognition site.
   Enzymes with sites shorter than this
   number will not appear in the output. Use numeric notation.
   Default is 5.

isoschizomers - "yes" or "no".
   Specifies whether to include isoschizomers.
   Default is "no" (prototypes only).

overhang      - An array reference containing any combination of "five_prime",
   "three_prime" and "blunt". Specifies which overhangs to include.
   Default is all three.

enzymelist    - An array reference containing a list of enzymes. Note that this
   list overides all other selection criteria (overhang, minlength etc.).
   Specifies exactly which enzymes to return.
   Default is no list.

If you want a virtual digest instead of a map use the get_digest method. get_digest takes only the DNAtype and enzymelist parameters (see above). Note that the enzymelist parameter is required.

     my %settings = (DNAtype => 'circular', enzymelist => ['BamHI', 'EcoRI']);
     my $digest = $request->get_digest(\%settings);

Both get_digest and get_map return result tables as objects. The map object methods are as follows:

get_next_enz: returns the next row in the table as a hash reference or 0 when
   done. The keys are NAME, SITE, OVERHANG, LENGTH, CUTNUMBER and CUTLIST. These
   contain respectively the enzyme name, a DNA "regexp" representing the
   enzyme's recognition sequence, the type of overhang the enzyme produces
   (five_prime, three_prime or blunt), the length of the recognition sequence,
   the number of times the enzyme cuts the sequence, and a reference to an array
   of cut positions.

   while (my $cuts = $restriction_map->get_next_enz) {

      # Name of the enzyme
      my $enz = $cuts->{NAME};

      # A DNA "regexp" of the enzyme's recognition site.
      my $recognition_site = $cuts->{SITE};

      # The number of unique bases in site (doesn't include gaps).
      my $site_length = $cuts->{LENGTH};

      # The number of recognition sites in the sequence 
      my $number_of_cuts = $cuts->{CUTNUMBER};

      # 'five_prime', 'three_prime' or 'blunt'.
      my $overhang_type = $cuts->{OVERHANG};

      # A reference to a list of cut locations.
      my @cut_locations = @{$cuts->{CUTLIST}};
   }

tab_file - returns the entire results table in tab delimited text format.
  my $table = $restriction_map->tab_file();

enzyme_list - returns an alphabetical list of enzymes that cut the sequence.
              DO NOT use this method if you need to preserve the original
              name order, instead, populate your array one entry at a time
              using get_next_enz().
  my @enzymes = $restriction_map->enzyme_list();

cuts - returns a unique, ordered list of cut positions. Note that this
       method will eliminate duplicate cuts produced by different enzymes.
  my @cuts = $restriction_map->cuts();

enz_number - returns the number of enzymes that cut your sequence.
  my $total = $restriction_map->total();

noncutters - returns a list of names of enzymes that do not cut the sequence.
             Note that these enzymes are subject to the same selection
             criteria as the cutters.
  my @noncutters = $restriction_map->noncutters();

The digest object methods are as follows:

get_next_fragment - returns the next row in the fragments table as a hash
                    reference or 0 when finished. 
The keys are: 
         SEQUENCE - holds the DNA sequence of the fragment.
         LENGTH - the length of the sequence in bases.
         FIVEPRIME - the name of the enzyme that made the 5' cut.
         STARTPOS - the position of the 5' cut in the original sequence. 
         THREEPRIME - the name of the enzyme that made the 3' cut.
         ENDPOS - the position of the 3' cut in the original sequence.

   while (my $fragment = $virtual_digest->get_next_fragment) {
      #returns reference to hash of fragment info
   }

tab_file - same as above

Notes

If you are analyzing multiple sequences, it is easier to change the sequence than to create a new request object.
The built in sorting and filtering features of sitefind3.pl can be useful. For example, if you find yourself looping through the table looking for blunt end cutters, just repeat the request with overhang => ['blunt'].
Stored HTML reports can be analyzed directly with RemoteRestMap::Map and RemoteRestMap::Digest.
my $map = RemoteRestMap::Map->new($html_report);
The tab_file method is useful if you need to export your table to a spread sheet.
If you are using RemoteRestMap in a production environment, please install your own copy of sitefind3.pl. This is available here

Please email webmaster |AT| restrictionmapper.org with questions or bug reports.