论文网
English Papers
万事OK网
发表论文
 
 首页 > IT文章 > 程序设计 >
后缀树-SuffixTree概念

[科技论文网] http://www.scipapers.com    2007-12-01  

    后缀树-SuffixTree概念

    基本概念

    关于 suffix(后缀),suffix tree(后缀树),generalised suffix tree(一般后缀树)以及 suffix link(后缀链接)等等,都可以在如下页面找到明确的定义,不在此一一赘述。

    http://www.answers.com/topics/suffix-tree (实际上就是【维基百科】)

    看两个例子先

    因 suffix link 难于在字符图内表示,故略之。
     
    apple: |--apple$--[0]
           |--e$--[4]
           |--le$--[3]
           |--p--|--le$--[2]
                 |--ple$--[1]
     
    banana: |--a--|--$--[5]
            |     |--na--|--$--[3]
            |     |      |--na$--[1]
            |--banana$--[0]
            |--na--|--$--[4]
            |      |--na$--[2]
     
    后缀树有什么用
     
    同样是上面那个链接,在 Functionality 一栏,可以看到详细的描述。因为当中的条目,是本系列篇什的行动指南,故录之于此。

    A suffix tree for a string S of length n can be built in Θ(n) time, if the alphabet is constant or integer. Otherwise, the construction time depends on the implementation. The costs below are given under the assumption that the alphabet is constant. If it is not, the cost depends on the implementation (see below).
     
    若字符集恒定或为整数,则一个长为 n 的字符串 S,其后缀树可于 Θ(n) 的时间内得以构建。否则,构造时间依实现而定。如下给出的开销数据即基于字符集为恒定的假设。若字符集不为恒定,则开销依实现而定。(偶发现【翻译】是一件及其痛苦和费时的事,决意以后不再翻译!)
     
    Assume that a suffix tree has been built for the string S of length n, or that a generalised suffix tree has been built for the set of strings D = {S1,S2,...,SK} of total length n = | n1 | + | n2 | + ... + | nK | . You can:

    • Search for strings:
      • Check if a string P of length m is a substring in O(m) time.
      • Find all z occurrences of the patterns P1,...,Pq of total length m as substrings in O(m + z) time.
      • Search for a regular expression P in time expected sublinear on n.
      • Find for each suffix of a pattern P, the length of the longest match between a prefix of P[i...m] and a substring in D in Θ(m) time. This is termed the matching statistics for P.
    • Find properties of the strings:
      • Find the longest common substrings of the string Si and Sj in Θ(ni + nj) time.
      • Find all maximal pairs, maximal repeats or supermaximal repeats in Θ(n + z) time.
      • Find the Lempel-Ziv decomposition in Θ(n) time.
      • Find the longest repeated substrings in Θ(n) time.
      • Find the most frequently occurring substrings of a minimum length in Θ(n) time.
      • Find the shortest strings from Σ that do not occur in D, in O(n + z) time, if there are z such strings.
      • Find the shortest substrings occurring only once in Θ(n) time.
      • Find, for each i, the shortest substrings of Si not occurring elsewhere in D in Θ(n) time.

    The suffix tree can be prepared for constant time lowest common ancestor retrieval between nodes in Θ(n) time. You can then also:

    • Find the longest common prefix between the suffixes Si[p..ni] and Sj[q..nj in Θ(1).
    • Search for a pattern P of length m with at most k mismatches in O(kn + z) time, where z is the number of hits.
    • Find all z maximal palindromes in Θ(n), or Θ(gn) time if gaps of length g are allowed, or Θ(kn) if k mismatches are allowed.
    • Find all z tandem repeats in O(nlogn + z), and k-mismatch tandem repeats in O(knlog(n / k) + z).
    • Find the longest substrings common to at least k strings in D for k = 2..K in Θ(n) time.
    下一步
     
    打算用 C++ 或 Python 来实现构建后缀树的 Ukkonen 方法。

        来源:

声明:本文由网友推荐或作者提交,版权归原作者所有,刊登此文仅为传播知识,展示研究成果,提高文章引用率。未经原作者授权,禁止用于任何形式的商业行为。科技论文网倡导尊重知识、尊重劳动、保护原创、知识共享。由于部分论文文章来于网络,文章作者不祥,请相关的原创作者与我们联系,以便加上您的署名。

  
后缀树-SuffixTree概念
下面没有链接了     素数算法
最新论文
·[程序设计]后缀树-SuffixTree概念
·[程序设计]素数算法
·[程序设计]马踏棋盘算法分析
·[程序设计]字典树实现源代码
·[程序设计]Java和C#的Hash算法
·[程序设计]hash表及如何选择hash函数
·[程序设计]一致性哈希(Consistent Hash
·[程序设计]对冒泡算法的改进
·[程序设计]杨辉三角算法源码
·[程序设计] C脚本递归算法-计算八皇后问题
 
 

搜索论文

Google
论文分类

论文网 论文发表网 论文 免费论文网 找论文网 毕业论文 中国论文网 英语论文 百度论文 聘教网 易搜
 免费发布论文    中国论文网 2008版权所有  业务联系:pinjiao@126.com