gnFASSource Class Reference

gnFASSource reads and writes FastA files. More...

#include <gnFASSource.h>

Inheritance diagram for gnFASSource:

Inheritance graph
[legend]
Collaboration diagram for gnFASSource:

Collaboration graph
[legend]
List of all members.

Public Member Functions

gnFASSourceClone () const
 Returns an exact copy of this class.
gnFileContigGetContig (const uint32 i) const
uint32 GetContigID (const string &name) const
 Get a contig index by name.
uint32 GetContigListLength () const
 Get the number of sequence contigs in this source.
string GetContigName (const uint32 i) const
 Get the name of the specified contig.
gnSeqI GetContigSeqLength (const uint32 i) const
 Get the total number of base pairs in the specified contig.
gnFileContigGetFileContig (const uint32 contigI) const
 Returns a pointer to the file contig corresponding to contigI or null if none exists.
gnGenomeSpecGetSpec () const
 Get the annotated sequence data as a gnGenomeSpec.
 gnFASSource (const gnFASSource &s)
 Clone Constructor copies the specified gnFASSource.
 gnFASSource ()
 Empty Constructor, does nothing.
boolean HasContig (const string &name) const
 Looks for a contig by name.
boolean SeqRead (const gnSeqI start, char *buf, gnSeqI &bufLen, const uint32 contigI=ALL_CONTIGS)
 Gets sequence data from this source.
 ~gnFASSource ()
 Destructor, frees memory.

Static Public Member Functions

boolean Write (gnBaseSource *source, const string &filename)
 Deprecated - do not use.
void Write (gnSequence &sequence, ostream &m_ostream, boolean write_coords=true, boolean enforce_unique_names=true)
 Write the given gnSequence to an ostream.
void Write (gnSequence &sequence, const string &filename, boolean write_coords=true, boolean enforce_unique_names=true)
 Write the given gnSequence to a FastA file.

Private Member Functions

boolean ParseStream (istream &fin)
boolean SeqSeek (const gnSeqI start, const uint32 contigI, uint64 &startPos, uint64 &readableBytes)
boolean SeqStartPos (const gnSeqI start, gnFileContig &contig, uint64 &startPos, uint64 &readableBytes)

Private Attributes

vector< gnFileContig * > m_contigList

Detailed Description

gnFASSource reads and writes FastA files.

gnFASSource is used by gnSourceFactory to read files. Files can be written in the FastA file format by calling gnFASSource::Write( mySpec, "C:\\myFasFile.fas");

Definition at line 32 of file gnFASSource.h.


Constructor & Destructor Documentation

gnFASSource::gnFASSource  ) 
 

Empty Constructor, does nothing.

Definition at line 20 of file gnFASSource.cpp.

References DebugMsg(), and gnFilter::fullDNASeqFilter().

Referenced by Clone().

gnFASSource::gnFASSource const gnFASSource s  ) 
 

Clone Constructor copies the specified gnFASSource.

Parameters:
s The gnFASSource to copy.
Definition at line 28 of file gnFASSource.cpp.

References m_contigList.

gnFASSource::~gnFASSource  ) 
 

Destructor, frees memory.

Definition at line 36 of file gnFASSource.cpp.

References m_contigList.


Member Function Documentation

gnFASSource * gnFASSource::Clone  )  const [inline, virtual]
 

Returns an exact copy of this class.

Implements gnFileSource.

Definition at line 113 of file gnFASSource.h.

References gnFASSource().

gnFileContig * gnFASSource::GetContig const uint32  i  )  const
 

Definition at line 84 of file gnFASSource.cpp.

References m_contigList, and uint32.

Referenced by main().

uint32 gnFASSource::GetContigID const string &  name  )  const [virtual]
 

Get a contig index by name.

If the source does not contain a contig by the specified name GetContigID returns UINT32_MAX.

Parameters:
name The name of the contig to look for.
Returns:
The index of the named contig or UINT32_MAX.

Implements gnBaseSource.

Definition at line 56 of file gnFASSource.cpp.

References m_contigList, and uint32.

uint32 gnFASSource::GetContigListLength  )  const [inline, virtual]
 

Get the number of sequence contigs in this source.

Returns:
The number of contigs in this source.

Implements gnBaseSource.

Definition at line 119 of file gnFASSource.h.

References m_contigList, and uint32.

Referenced by main().

string gnFASSource::GetContigName const uint32  i  )  const [virtual]
 

Get the name of the specified contig.

Returns an empty string if the specified contig is out of range.

Parameters:
i The index of the contig or ALL_CONTIGS.
Returns:
The name of the contig or an empty string.

Implements gnBaseSource.

Definition at line 65 of file gnFASSource.cpp.

References m_contigList, and uint32.

gnSeqI gnFASSource::GetContigSeqLength const uint32  i  )  const [virtual]
 

Get the total number of base pairs in the specified contig.

Parameters:
i The index of the contig or ALL_CONTIGS.
Returns:
The length in base pairs of the specified contig.

Implements gnBaseSource.

Definition at line 72 of file gnFASSource.cpp.

References gnSeqI, GNSEQI_ERROR, m_contigList, and uint32.

gnFileContig * gnFASSource::GetFileContig const uint32  contigI  )  const [virtual]
 

Returns a pointer to the file contig corresponding to contigI or null if none exists.

Implements gnFileSource.

Definition at line 381 of file gnFASSource.cpp.

References m_contigList, and uint32.

gnGenomeSpec * gnFASSource::GetSpec  )  const [virtual]
 

Get the annotated sequence data as a gnGenomeSpec.

GetSpec returns a gnGenomeSpec which contains the sequence, header, and feature data contained by this source.

Returns:
The annotated sequence data.

Implements gnBaseSource.

Definition at line 357 of file gnFASSource.cpp.

References gnMultiSpec< SubSpec >::AddHeader(), gnMultiSpec< SubSpec >::AddSpec(), gnContigHeader, m_contigList, gnBaseSpec::SetName(), gnContigSpec::SetSourceName(), gnMultiSpec< SubSpec >::SetSourceName(), and uint32.

boolean gnFASSource::HasContig const string &  name  )  const [virtual]
 

Looks for a contig by name.

Returns true if it finds the contig, otherwise false.

Parameters:
name The name of the contig to look for.
Returns:
True if the named contig exists, false otherwise.

Implements gnBaseSource.

Definition at line 47 of file gnFASSource.cpp.

References m_contigList, and uint32.

boolean gnFASSource::ParseStream istream &  fin  )  [private, virtual]
 

Implements gnFileSource.

Definition at line 387 of file gnFASSource.cpp.

References gnFileContig::AddToSeqLength(), gnFileSource::DetermineNewlineType(), ErrorMsg(), gnContigHeader, gnContigSequence, isNewLine(), isSpace(), gnFilter::IsValid(), m_contigList, gnFileContig::SetFileEnd(), gnFileContig::SetFileStart(), gnFileContig::SetName(), gnFileContig::SetRepeatGapSize(), gnFileContig::SetRepeatSeqGap(), gnFileContig::SetRepeatSeqSize(), gnFileContig::SetSectEnd(), gnFileContig::SetSectStart(), uint32, and uint64.

boolean gnFASSource::SeqRead const gnSeqI  start,
char *  buf,
gnSeqI bufLen,
const uint32  contigI = ALL_CONTIGS
[virtual]
 

Gets sequence data from this source.

SeqRead will attempt to read "bufLen" base pairs starting at "start", an offset into the sequence. Reading inside a specific contig can be accomplished by supplying the "contigI" parameter with a valid contig index. SeqRead stores the sequence data in "buf" and returns the actual number of bases read in "bufLen". SeqRead will return false if a serious error occurs.

Parameters:
start The base pair to start reading at.
buf The character array to store base pairs into.
bufLen The number of base pairs to read.
contigI The index of the contig to read or ALL_CONTIGS by default.
Returns:
True if the operation was successful.

Implements gnBaseSource.

Definition at line 92 of file gnFASSource.cpp.

References gnSeqC, gnSeqI, gnFilter::IsValid(), m_contigList, SeqSeek(), uint32, and uint64.

Referenced by main().

boolean gnFASSource::SeqSeek const gnSeqI  start,
const uint32  contigI,
uint64 startPos,
uint64 readableBytes
[private]
 

Definition at line 178 of file gnFASSource.cpp.

References gnSeqI, m_contigList, SeqStartPos(), uint32, and uint64.

Referenced by SeqRead().

boolean gnFASSource::SeqStartPos const gnSeqI  start,
gnFileContig contig,
uint64 startPos,
uint64 readableBytes
[private]
 

Definition at line 205 of file gnFASSource.cpp.

References ErrorMsg(), gnFileContig::GetRepeatSeqGapSize(), gnFileContig::GetSectStartEnd(), gnContigSequence, gnSeqI, gnFileContig::HasRepeatSeqGap(), gnFilter::IsValid(), uint32, and uint64.

Referenced by SeqSeek().

boolean gnFASSource::Write gnBaseSource source,
const string &  filename
[static]
 

Deprecated - do not use.

Write the given source to a FastA file.

Parameters:
source The spec to write out.
filename The name of the file to write.
Definition at line 258 of file gnFASSource.cpp.

References gnBaseSource::GetContigListLength(), gnBaseSource::GetContigName(), gnBaseSource::GetContigSeqLength(), gnSeqC, gnSeqI, gnBaseSource::SeqRead(), and uint32.

void gnFASSource::Write gnSequence sequence,
ostream &  m_ostream,
boolean  write_coords = true,
boolean  enforce_unique_names = true
[static]
 

Write the given gnSequence to an ostream.

Parameters:
sequence The gnSequence to write out.
m_ostream The output stream to write to.
write_coords If true each entry's name will be followed by the coordinates of the entry in the context of the entrire file.
enforce_unique_names If true each entry's name will be recorded as they are written. Each successive duplicate name that is found will have an underscore and a number appended to it, indicating the number of entries by the same name which have already been written. Turning this off will yield a slight performance improvement when writing files with a large number of entries. (More than 1000)
Definition at line 292 of file gnFASSource.cpp.

References gnSequence::contigLength(), gnSequence::contigListLength(), gnSequence::contigName(), FAS_LINE_WIDTH, gnBaseHeader::GetHeader(), gnMultiSpec< SubSpec >::GetHeader(), gnMultiSpec< SubSpec >::GetSpec(), gnSequence::GetSpec(), gnSeqC, gnSeqI, gnSequence::ToArray(), uint32, and uintToString().

void gnFASSource::Write gnSequence sequence,
const string &  filename,
boolean  write_coords = true,
boolean  enforce_unique_names = true
[static]
 

Write the given gnSequence to a FastA file.

Parameters:
sequence The gnSequence to write out.
filename The name of the file to write.
write_coords If true each entry's name will be followed by the coordinates of the entry in the context of the entrire file.
enforce_unique_names If true each entry's name will be recorded as they are written. Each successive duplicate name that is found will have an underscore and a number appended to it, indicating the number of entries by the same name which have already been written. Turning this off will yield a slight performance improvement when writing files with a large number of entries. (More than 1000)
Exceptions:
A FileNotOpened() exception may be thrown.
Definition at line 284 of file gnFASSource.cpp.

References Throw_gnEx.

Referenced by main(), and WriteData().


Member Data Documentation

vector< gnFileContig* > gnFASSource::m_contigList [private]
 

Definition at line 109 of file gnFASSource.h.

Referenced by GetContig(), GetContigID(), GetContigListLength(), GetContigName(), GetContigSeqLength(), GetFileContig(), GetSpec(), gnFASSource(), HasContig(), ParseStream(), SeqRead(), SeqSeek(), and ~gnFASSource().


The documentation for this class was generated from the following files:
Generated on Mon Feb 14 19:29:50 2005 for libGenome by doxygen 1.3.8