| 主页 | 频道首页 | 本站地图 | 论坛留言 | 合作联系 | 本站消息 | |
科技动态 技术发展 文化研究 生物生态 人的研究 生命起源 基因工程 科学普及 科学探索 专题其他

生物分子网络及复杂网络中模式结构识别问题的研究

2010-08-05
系统生物学,复杂网络,生物分子网络,社团模块结构,网络对比,模块化指标,活性通路
生物分子网络及复杂网络中模式结构识别问题的研究

论文中英文摘要
  作者姓名:张世华
  论文题目:生物分子网络及复杂网络中模式结构识别问题的研究
  作者简介:张世华,男,1980年5月出生,2003年9月师从于中国科学院数学与系统科学研究院章祥荪研究员,于2008年7月获博士学位。
  
生物分子网络及复杂网络中模式结构识别问题的研究中文摘要
  
  随着高通量生物实验技术的进步, 可以获得大量的生物`组学(Omics)' 数据(如基因组、蛋白组、代谢组等). 而近几年, 复杂网络的研究正成为广泛关注的热点, 网络也成为刻画数据关系的重要的工具. 各种各样的大规模生物分子网络(如蛋白质相互作用网络、基因调控网络、代谢网络等) 也正成为研究生物系统的重要材料. 同时, 为我们系统地分析生物分子网络的拓扑结构、组织形式、演化规律提供了丰富的素材, 也为进一步研究生物的运行机制、稳定性提出了巨大的挑战. 在这些生物分子网络中识别和挖掘各种模块结构, 已成为系统生物学的基本问题. 这一基本问题也引起了生物、应用数学、物理学和计算机科学家的广泛注意. 本文即属于运筹学与应用数学在生物分子网络以及复杂网络中的应用. 这里主要侧重相关网络问题建模与算法的研究.

  研究和分析生物分子网络中的特殊模式结构, 有助于我们更好的理解细胞组织以及细胞功能, 进而系统的考察分子的运行机制. 本文主要从三个方面研究了生物分子网络以及一般复杂网络中的局部模式结构问题. 第一个方面的问题是生物分子网络的比对的问题, 目的是从中识别保守的模块结构; 第二个方面是从网络中直接识别模块结构, 目的是识别拓扑上重要的模块化结构包括模块化结构的识别和评价以及模糊模块结构的识别; 第三个方面是从静态的生物分子网络出发, 结合隐含的包含动态信息的基因表达数据识别特定状态的通路问题, 该问题最终可以归结为在赋权网络中识别权重最大的连通子图问题. 这三个方面的问题, 相互联系又相互区别, 从生物分子网络以及一般复杂网络的不同角度考察了网络的局部模式结构. 图.1展示了本论文的主要研究对象和研究目标.
  
  论文的第一章对生物分子网络研究的具体背景和基本知识进行了简单的介绍, 并对紧密相关的一般复杂网络的研究情况进行了介绍. 并在最后,概括了该论文的主要研究内容和基本结构. 第二章, 系统的总结了生物分子网络相关的研究, 特别强调了在计算的角度从网络水平上揭示生物功能和机制的系统生物学研究, 并最终全面地总结了相关的数据和软件工具.第三、四、五、六章分别介绍了本论文的主要结果。其中包括:
  
  (1) 提出了一种有效的网络比对方法. 通过不同物种或同一物种内部的分子网络比较方法来揭示保守的模式或信号途径逐渐成为越来越重要的问题. 为了找出保守的子结构, 我们设计了一个有效的分子网络比对(network alignment)算法. 该算法是基于整数二次规划(Integer quadratic programming: IQP) 方法,并综合了分子的相似性和网络拓扑的相似性. 这个IQP 模型可以松弛成相应的二次规划(Quadratic programming: QP) 模型, 并保证几乎
  
  图.1: 本论文中的主要研究对象、研究问题以及研究目标的总括.

总是得到整数解.因此, 不需要任何启发式操作就能使生物分子网络比对问题得以求解. 这个新提出的框架非常灵活, 它可以应用于各种分子网络包括赋权的、不赋权的, 有向的、无向的, 甚至有环或者无环的.
  
  (2) 对网络的模块化分解问题, 我们提出了一种新的评价模块划分的定量指标函数, 即模块化密度(modularity density, D). 我们首先证实了这个指标函数比目前广泛使用的模块化指标Q 更优越, 并且证明了它与核k-均值(kernel k-means) 目标函数的等价性. 通过理论和数值验证, 结果显示了优化这个新指标不仅可以从网络中解析出已有方法不能得到的精细模块, 而且可以准确识别网络中包含的模块数目.
  
  (3) 设计了两种从网络中探测模糊模块的算法. 第一种方法受到一般模糊几何聚类的启发, 综合了基于模糊概念推广的模块化指标Q, 一种近似的在欧氏空间的谱松弛映射和模糊c-均值聚类(fuzzy c-means clustering: FCM) 方法. 第二种方法是基于非负矩阵分解(non-negative matrix factorization: NMF) 技术,将其应用于由网络的扩散核(diffusion kernel) 代表的特征空间. 两种方法都可以根据定量的模块化指标函数识别模块的数目, 并使得一些顶点可能属于不只一个模块. 突出特点是前者可以给出一个相对隶属度, 而后者可以量化一个顶点属于一个模块的绝对隶属程度. 在各种测试网络上的数值模拟实验表明, 这两种方法可以用于识别模糊的模块结构.
  
  (4) 研究了一种带约束的最大边权重连通子图问题并给出了算法. 通过整合基因表达数据和蛋白质相互作用网络, 识别具体生物过程相关的活性子图问题可以表示成一个求最大边权重连通子图(maximum edge-weight connected graph: MECG) 的组合优化问题. 给定一个边赋权的网络, 最大边权重的连通子图问题就是在该网络中寻找一个边数一定的连通子图, 使得其边的权重之和最大. 这里我们研究一种特殊情况, 即带约束的最大边权重连通子图(constraint maximum edge weight connected graph: CMECG) 问题, 也就是寻找包括给定的k 条边的最大边权重连通子图, 我们将该问题记作k-CMECG 问题. 当k = 1时, 我们证明了1-CMECG 问题是NP-完全的; 并且给出了精确算法和启发式算法. 我们还针对k-CMECG 问题提出了一个启发式算法. 并且指出可以利用这一特殊情形可以导出一般MECG 问题的解.
  
  最后在第七章, 对全文进行了简明的总结;并讨论了与生物特别意义子图识别问题密切相关的问题、以及复杂网络紧密相关的问题;探讨了进一步的研究方向.


  关键词:系统生物学, 复杂网络, 生物分子网络, 社团/模块结构,网络对比, 模块化指标,活性通路.

The Study of Problems in Detecting Pattern Structures from Biomolecular Networks and Complex Networks
Zhang Shihua
ABSTRACT
  
  With the progress of high-throughput technologies, more and more `omic’ data can be obtained now. Also the study of complex network is becoming a hot research field and networks has become very important tools to describe relationship data. Various large-scale biomolecular networks (e.g. protein interaction networks, gene regulatory networks and metabolic networks) are becoming important research objects. Study on biomolecular networks has become one of the key focuses in systems biology and bioinformatics for demonstrating great potentials to discover basic functions and to reveal essential mechanisms for various biological phenomena, by understanding biological systems not at individual component level but at a system-wide level. Recent studies on networks have created very prolific researches on many aspects of living organisms, but there are still many challenging problems in this field. Detecting and mining various pattern structure in these biomolecular networks have become basic problems in systems biology. These problems have attracted comprehensive attention of different field including biology, applied mathematics, physics and computer scientists. This thesis is just the application of operational research and applied mathematics in biomolecular networks and complex networks.
  
  Studies and analysis of the special pattern structure in bimolecular networks are very helpful to understand the cellular organization and functions, and further explore the biological mechanisms systematically. Here we study the local pattern structure in biomolecular networks and general complex networks from three aspects. The first problem is the network alignment problem of biomolecular networks which aim to find conservative modular structure. The second problem is to detect the community structure or modular structure in networks directly. The goal is to uncover the topologically important modular structure. The studies include the detection and evaluation of this structure as well as the detection of fuzzy community structure. The third problem is to identify the active pathway from static biomolecular networks coupled with gene expression data which imply dynamic information. This problem can be reduced to identify the maximum weighted connected subnets in weighted networks. These problems are related but different, and explore the local pattern structure of biomolecular networks and complex networks from different view. From Figure.1, we illustrate the major research object and research goal.
  
  In the first chapter, we briefly introduce the background and basic knowledge for the study of biomolecular networks. And then we introduce high-related studies about general complex networks. We summarize the major contents and basic structure of this thesis. In the second chapter, we survey the recent developments on topics related to molecular networks in a comprehensive manner and highlight the studies to reveal biological functions and mechanisms on the computational systems biology view. And summarize the representative databases and software tools comprehensively. In the third, fourth, fifth and sixth chapters, we focus the major studies of this thesis. The major results include:
  
  (1)We propose an efficient network alignment method. The discovery of conserved patterns or signaling pathways by comparing various kinds of networks among different species or within a species becomes an increasingly important problem. To find the conserved substructures, we develop an efficient algorithm for aligning molecular networks based on both molecule similarity and architecture similarity, by using integer quadratic programming (IQP). Such an IQP can be relaxed into the corresponding quadratic programming (QP) which almost always ensures an integer solution, thereby making molecular network alignment tractable without any approximation. The

Figure.1: Illustration of the major research objects, problems and goal.

proposed framework is very flexible and can be applied to many kinds of molecular networks including weighted and unweighted, directed and undirected networks with or without loops.
  (2) We propose a novel quantitative function for network partition into communities, i.e. modularity density|D. We demonstrate that this quantitative function is superior to the widely used modularity Q, and also prove its equivalence with the objective function of kernel k-means. Both theoretical and numerical results show that optimizing the new criterion not only can resolve detailed modules that the existing approaches cannot achieve, but also can correctly identify the number of communities.
  
  (3) We develop two novel methods to detect fuzzy/overlapping communities in complex networks. The first one is based on the combination of a new modularity function based on generalizing modularity Q, an approximation mapping of network nodes into Euclidean space and fuzzy c-means clustering. The second one is based on a popular modular function, a proper feature matrix from diffusion kernel and non-negative matrix factorization (NMF) algorithm. The both methods can detect an appropriate number of fuzzy communities in which a node may belong to more than one communities. The distinguished characteristic of the two methods is their capability of quantifying how much a node belongs to a community. The computational results of the two methods on artificial and real networks confirm their ability.
  
  (4) We study the constrained maximum edge-weight connected graph problem and devise new algorithms. The identification of `active' subnets with specific biological processes via integration of the gene expression data and protein-protein interaction networks can be formulated a combinatorial optimization problem for finding maximum edge-weight connected graph (MECG). Given an edge weighted graph, MECG is a connected subgraph with a given number of edges and maximal weight sum. The MECG problem is NP-complete. Here we study a special case, i.e. the Constrained Maximum Edge-Weight Connected Graph problem (CMECG), which is a MECG whose candidate subgraphs must include a given set of k edges, then also called k-CMECG. The special case, 1-CMECG, is proved to be NP-complete, for which an exact algorithm and a heuristic algorithm are given. We also propose a heuristics algorithm for the k-CMECG problem. Moreover, we show that the special case can lead to the solution of the general MECG problem.
  
  Finally, we conclude the thesis briefly in the seventh chapter. And we discuss related problems with detection of biologically significant subnets in biomolecular networks as well as its closely related problems in complex networks. We highlight several further research directions in this field.
  
Key words: Systems Biology, Complex Network, Biomolecular Network, Community/Module Structure, Network Alignment, Modularity Density, Active Pathway.

科学家发现植物会思考记忆
美国科学家研究发现运动可预防衰老机理
我国科学家首次揭示早期羽毛演化的奥秘
科学家首次发现可无氧生存新生物
研究发现类人猿能察觉选择中的错误
盘点5种孕育后代雄性动物:雄性产卵产子
达尔文的“真实谎言”
海豚太过聪明,不适合在主题公园表演
用干细胞让失明者恢复视觉
哥本哈根沉船理论
行为生物学常识
生物学世纪呼唤生物学哲学
地球生命曾两度“狂长个”氧气是主要动力
新发现-蚂蚁大起义
丹麦科学家首次观察到人类活细胞交流
生物大灭绝为什么反复发生
上海科学家发现乙肝病毒复制“开关”
研究发现工蜂欺骗蜂王偷偷繁育后代
科学家发现怪异新生物为逃命准备防御性武器
菲律宾发现最大食肉植物能生吞老鼠
《动物的社会行为》
中国古代生物战争的起源及社会与文化后果
《动物的社会行为》
脐血疑云
神经干细胞不会自动“补脑”
风暴中的生物技术产业
奥巴马9日将签署命令解除干细胞研究限制
生物生态1 生物生态2

本栏目主要介绍生物学和生态学方面,包括生物技术、生物工程、生态经济、生态系统、生态环境、生物分子网络及复杂网络中模式结构识别问题的研究等。特别关注有关人与文化方面的研究。

『科学频道首页』 『本栏页首』 『关闭窗口』