Change summary from MUSCLE version 3.8.31 to 3.8.31b Several bug fixes and performance enhancements (e.g. speed, memory usage) have been made to MUSCLE. Changes marked with a '*' are considered more important. For example, pairwise distance calculations have been accelerated with SSE instructions (when supported) for a 30-50x speedup. This speedup is significant when aligning data sets with thousands of sequences (e.g. 10,000+ flu viruses). Unfortunately the assembly is GCC specific. Another improvement packs the traceback pointers of dynamic programming into nibbles, thus saving half the memory. This is significant when aligning sequences with a length greater than 10,000 (e.g. Herpes virus at 167,043). Also, using a "char" type instead of "int" for m_uSortOrder in profile.h saves a significant amount of memory. Changes have been made to pave the way for dynamic allocation of the struct ProfPos arrays (m_uSortOrder[], m_fcCounts[], m_AAScores[]) in profile.h. All code will now only access these arrays up to g_AlphaSize depending on a DNA or protein alignment. The fixed allocation of length 20 for each of these arrays is more than needed for DNA alignments and can add up to gigabytes. A DNA only version of MUSCLE has been tested by setting the length of these arrays to 4. It was able to align a large data set on a desktop computer, whereas the stock MUSCLE crashed a 128 GB node on the university supercomputer. Many of the other changes are in support of the above enhancements. For example, only '-' is used internally as the gap character to improve the performance of pairwise distance calculations from an MSA. When SSE code is used, the increase is substantial. Gap characters other than '-' are converted when reading the sequence files. Several source files have been lightly touched to improve the profiling of different stages of MUSCLE. File: aligngivenpath.cpp Use '-' internally as the only gap character Initialize PPStart dynamically for SetXXX() functions Use g_AlphaSize instead of a constant File: aligngivenpathsw.cpp Use '-' internally as the only gap character File: aligntwoprofs.cpp Remove asserts and dead code File: alpha.cpp Use '-' internally as the only gap character File: alpha.h Use '-' internally as the only gap character File: bittraceback.cpp * Add option to pack the traceback array in nibbles File: cons.cpp * SSE acceleration of GetPctIdentityPair() File: clust.cpp Improve timing and progress display File: distcalc.cpp Remove unnecessary (float) type casts and conversions File: distfunc.cpp * Fix memory leak File: distpwkimura.cpp Progress fixes File: domuscle.cpp Improve timing and progress display Calculate Other time in relation to Align time Time stages File: estring.cpp * Use a typedef for the estring type File: estring.h * Use a typedef for the estring type File: fasta2.cpp Use '-' internally as the only gap character File: fastclust.cpp Improve timing and progress display Time stages File: fastdistjones.cpp Improve timing and progress display File: fastdistmafft.cpp Improve timing and progress display File: fastdistnuc.cpp Improve timing and progress display * Improve performance and simplify * SSE acceleration of CompareTuples() File: glbalign.cpp Improve timing and progress display File: globalslinux.cpp Use defined(__unix__) and fix some corner cases File: globalsosx.cpp Remove dead code File: globalsother.cpp Use defined(__unix__) File: Makefile Specify Makefile in snapshot * Add make depend * Complete Makefile File: makerootmsa.cpp * Use a typedef for the estring type File: msa.cpp Delete arrays m_szSeqs and m_szNames File: msa2.cpp * Divide ambiguous nucleotide weight by 4 instead of 20. * Add ambiguous residue weight to total weight. File: msf.cpp Use '-' internally as the only gap character File: muscle.h * Add option to pack the traceback array in nibbles * Change type of m_uSortOrder to unsigned char in struct ProfPos File: nwdasmall.cpp Change the traceback array type to unsigned File: nwsmall.cpp * Add option to pack the traceback array in nibbles File: objscore.cpp Improve timing and progress display * Objective function score g_ticksObjScore counted twice File: profile.h * Change type of m_uSortOrder to unsigned char in struct ProfPos * Use a typedef for the estring type File: profilefrommsa.cpp * Change type of m_uSortOrder to unsigned char in struct ProfPos File: progalign.cpp Improve timing and progress display Remove unnecessary argument from ProfileFromMSA() File: progress.cpp Progress fixes Improve timing and progress display File: progressivealign.cpp Improve timing and progress display File: pwpath.cpp Remove dead code File: realigndiffs.cpp Improve timing and progress display File: realigndiffse.cpp Improve timing and progress display File: refinehoriz.cpp Improve timing and progress display * Remove unused test code that causes a crash File: refinesubfams.cpp Improve timing and progress display File: refinetree.cpp Improve timing and progress display File: refinetreee.cpp Improve timing and progress display File: scoredist.cpp Progress fixes File: scorehistory.cpp Improve timing and progress display File: seq.cpp Use '-' internally as the only gap character File: seqvect.cpp Use '-' internally as the only gap character File: subfam.cpp Improve timing and progress display File: timing.h Add timing options RDTSC, TIMEOFDAY, GETTIME File: treefrommsa.cpp Improve timing and progress display File: upgma2.cpp Improve timing and progress display