RosettaScript1

2023-06-13 19:15:23

0 XML骨架<ROSETTASCRIPTS>n <SCOREFXNS>n </SCOREFXNS>n <RESIDUE_SELECTORS>n </RESIDUE_SELECTORS>n <TASKOPE

0 XML骨架

<ROSETTASCRIPTS>n <SCOREFXNS>n </SCOREFXNS>n <RESIDUE_SELECTORS>n </RESIDUE_SELECTORS>n <TASKOPERATIONS>n </TASKOPERATIONS>n <SIMPLE_METRICS>n </SIMPLE_METRICS>n <FILTERS>n </FILTERS>n <MOVERS>n </MOVERS>n <PROTOCOLS>n </PROTOCOLS>n <OUTPUT />n</ROSETTASCRIPTS>

方便的提示：要得到上面的空模板脚本，你可以运行rosetta_scripts应用程序，省略-parser:protocol标志。如果省略了这个标志（即没有提供输入脚本），那么应用程序会打印出模板脚本并退出。这在人们坐下来写一个新的脚本时非常有用。

1 XML文件例子

下面的现代例子使用:in:file:native，对一个蛋白质的CDR loop 进行最小化，在最小化前后计算各种度量。这些指标将全部输出到scorefile中，并给出前缀/后缀和每个指标的名称。

Rosetta将执行PROTOCOLS中指定的操作顺序。重要的一点是，SimpleMetrics和Filters从不改变结构的序列或构象。

这些移动器确实改变了pose，输出文件将是依次应用协议部分的移动器的结果。输出的标准分数将从任何做评分的协议中延续下来，除非指定了OUTPUT标签，在这种情况下将使用SCOREFXNS块中的相应评分函数。你可以在OUPUT标签中使用 "commandline "这个名字作为评分函数。请注意，这意味着如果你的pose在协议期间没有被打分，你的输出中就不会有打分信息!

额外的xml脚本例子，包括对接、蛋白质界面设计和预包装蛋白质复合物等的例子，可以在Rosetta/demos/public/rosetta_scripts/目录中找到。

下面的命令行将运行上述协议，鉴于协议文件名是ala_scan.xml

Rosetta/main/source/bin/rosetta_scripts.linuxgccrelease -s <输入PDB文件名> -use_input_sc -nstruct 20 -ex1 -ex2 -parser:protocol ala_scan.xml -parser:view

请注意，为了让RosettaScripts真正考虑大多数命令行选项，你需要使用任务操作。

parser:view flag可用于使用extras=graphics开关编译的rosetta可执行文件，方法如下（从Rosetta根目录）：

scons mode=release -j3 bin extras=graphics

当用-parser:view运行时，会打开一个图形查看器，显示轨迹中的许多步骤。这对于确保取样是按照预定的轨迹进行是非常有用的。

2 RosettaScript一般约定

2.1 General Comments

这个文件列出了RosettaScripts所认可的移动器、过滤器、它们的默认值、含义和用途。它是以xml格式写成的，只要文件的扩展名是.xml，使用许多免费的查看器（如vi）就能突出显示关键的xml符号。

每当显示一个xml语句时，将使用以下惯例：

<...> 定义一个分支语句（一个有更多叶子的语句） <.../> 一个叶子语句。"" 定义期望从用户那里得到的输入，用& 定义期望的类型（字符串、浮点等）（）定义默认值，如果协议没有提供，解析器将使用该值。

2.2 Specifying Residues

在Rosetta中使用了两种残基编号方法--"pose编号 "和 "pdb编号"。pose编号法给第一条链的第一个残基分配一个1的值，然后从那里按顺序编号，忽略新链的开始和缺失的残基。Pdb编号使用输入的pdb文件中存在的链/残基/插入代码的指定。一般来说，只要给一个带有链的残基标识符，它就会被PDB编号，而没有链的就是pose编号。

例如，如果你有一个PDB文件，其中有两条链，链A的残基12-62，链B的残基5-20和32-70，链A的PDB残基12的pose编号是1，链A的PDB残基62的pose编号是51。链B残基5的pose编号为52，B链残基32的pose编号为68。

在许多接受残基标识符的RosettaScripts标签中，有一个联合选项，可以用pose编号或PDB编号来指定它，记作res_num/pdb_num之类。对于有这个选项的标签，你可以指定res_num=或者pdb_num=，但不能同时指定。res_num选项需要一个pose编号的残基指定，而pdb_num选项需要一个pdb编号指定，形式为 "42.A "或 "42A"，其中A指定链，42是pdb残基编号。目前，不可能用pdb_num选项指定插入代码。

在使用PDB编号与改变pose长度的协议时必须小心。残基的插入会使与pose相关的PDB信息失效，导致pdb编号被解码时出现错误。此外，一些RosettaScripts对象会根据输入的结构编号将PDB编号转换为pose编号，如果添加/删除残基，就会导致潜在的错位。

2.3 获取帮助

尽管本文档旨在成为RosettaScripts的主要用户手册，但也有应用中的帮助。要获得一个空的模板脚本，只需运行rosetta_scripts程序，不加任何输入标志。比如说

> ./bin/rosetta_scripts.default.linuxgccrelease

这将产生以下输出：

core.init: USEFUL TIP: Type -help to get the options for this Rosetta executable.napps.public.rosetta_scripts.rosetta_scripts: No XML file was specified with the "-parser:protocol <filename>" commandline option. In order for RosettaScripts to do something, it must be provided with a script.napps.public.rosetta_scripts.rosetta_scripts: The following is an empty (template) RosettaScripts XML file:nn<ROSETTASCRIPTS>nt<SCOREFXNS>nt</SCOREFXNS>nt<RESIDUE_SELECTORS>nt</RESIDUE_SELECTORS>nt<TASKOPERATIONS>nt</TASKOPERATIONS>nt<SIMPLE_METRICS>nt</SIMPLE_METRICS>nt<FILTERS>nt</FILTERS>nt<MOVERS>nt</MOVERS>nt<PROTOCOLS>nt</PROTOCOLS>nt<OUTPUT />n</ROSETTASCRIPTS>nnAt any point in a script, you can include text from another file using <xi:include href="filename.xml" />.napps.public.rosetta_scripts.rosetta_scripts: Variable substituion is possible from the commandline using the -"parser:script_vars varname=value" flag. Any string of the pattern "%%varname%%" will be replaced with "value" in the script.napps.public.rosetta_scripts.rosetta_scripts:napps.public.rosetta_scripts.rosetta_scripts: The rosetta_scripts application will now exit.

你也可以使用-parser:info <name1> <name2> <name3> ...flag 获得任何mover、filter、task operation或residue selector 的语法帮助。例如，下面的命令行将提供关于MutateResidue mover和HbondsToAtom filter的信息：

./bin/rosetta_scripts.default.linuxgccrelease -info MutateResidue HbondsToAtom

输出结果如下：

The rosetta_scripts application was used with the -parser:info flag.nWriting options for the indicated movers/filters/task operations/residue selectors:n--------------------------------------------------------------------------------nINFORMATION ABOUT MOVER "MutateResidue":nnDESCRIPTION:nnChange a single residue or a given subset of residues to a different type. For instance, mutate Arg31 to an Asp, or mutate all Prolines to AlaninennUSAGE:nn<MutateResidue target=(string) new_res=(string) mutate_self=(bool,"false") perserve_atom_coords=(bool,"false") update_polymer_bond_dependent=(bool) preserve_atom_coords=(bool) residue_selector=(string) name=(string)>n</MutateResidue>nnOPTIONS:nn"MutateResidue" tag:nnttarget (string): The location to mutate. This can be a PDB number (e.g. 31A), a Rosetta index (e.g. 177), or an index in a reference pose or snapshot stored at a point in a protocol before residue numbering changed in some way (e.g. refpose(snapshot1,23)). See the convention on residue indices in the RosettaScripts Conventions documentation for detailsnntnew_res (string): The name of the residue to introduce. This string should correspond to the ResidueType::name() function (eg ASP).nntmutate_self (bool,"false"): If true, will mutate the selected residue to itself, regardless of what new_res is set to (although new_res is still required). This is useful to "clean" residues when there are Rosetta residue incompatibilities (such as terminal residues) with movers and filters.nntperserve_atom_coords (bool,"false"): If true, then atoms in the new residue that have names matching atoms in the old residue will be placed at the coordinates of the atoms in the old residue, with other atoms rebuilt based on ideal coordinates. If false, then only the mainchain heavyatoms are placed based on the old atom's mainchain heavyatoms; the sidechain is built from ideal coordinates, and sidechain torsion values are then set to the sidechain torsion values from the old residue. False if unspecified.nntupdate_polymer_bond_dependent (bool): Update the coordinates of atoms that depend on polymer bondsnntpreserve_atom_coords (bool): Preserve atomic coords as much as possiblenntresidue_selector (string): name of a residue selector that specifies the subset to be mutatednntname (string): The name given to this instancenn--------------------------------------------------------------------------------nINFORMATION ABOUT FILTER "HbondsToAtom":nnDESCRIPTION:nnThis filter counts the number of residues that form sufficiently energetically favorable H-bonds to a selected atomnnUSAGE:nn<HbondsToAtom partners=(int) energy_cutoff=(real,"-0.5") bb_bb=(bool,"0") backbone=(bool,"0") sidechain=(bool,"1") pdb_num=(refpose_enabled_residue_number) atomname=(string) res_num=(int) name=(string) confidence=(real,"1.0")>n</HbondsToAtom>nnOPTIONS:nn"HbondsToAtom" tag:nntpartners (int): H-bonding partner expectation, below which counts as failurenntenergy_cutoff (real,"-0.5"): Energy below which a H-bond countsnntbb_bb (bool,"0"): Count backbone-backbone H-bondsnntbackbone (bool,"0"): Count backbone H-bondsnntsidechain (bool,"1"): Count sidechain H-bondsnntpdb_num (refpose_enabled_residue_number): Particular residue of interestnntatomname (string): Atom name to which to examine H-bondsnntres_num (int): Residue number in Rosetta numbering (sequentially with the first residue in the pose being 1nntname (string): The name given to this instancenntconfidence (real,"1.0"): Probability that the pose will be filtered out if it does not pass this Filternn--------------------------------------------------------------------------------nnThe rosetta_scripts application will now exit.

3 XML协议文件中可用的选项

3.1 变量替换

偶尔，我们也希望用稍微不同的参数来运行一系列不同的运行。我们可以使用脚本变量来完成这项工作，而不是创建许多略有不同的XML文件。

如果在命令行中设置了-parser:script_vars选项，那么每次在XML文件中遇到"%variable_name%%"这样的字符串时，就会用命令行中的相应值来替换。

例如，XML中的一行字，如

<AlaScan name="scan" partner1="1" partner2="1" scorefxn="interface" interface_distance_cutoff="%%cutoff%%" repeats="%%repeat%%"/>

可以转化为:

<AlaScan name="scan" partner1="1" partner2="1" scorefxn="interface" interface_distance_cutoff="10.0" repeats="5"/>

用命令行选项:

-parser:script_vars repeat=5 cutoff=10.0

例如，在不同的运行中，这些值可以随意改变：

-parser:script_vars repeat=5 cutoff=15.0n-parser:script_vars repeat=2 cutoff=10.0n-parser:script_vars repeat=1 cutoff=9.0

"%%var%%"字符串的多个实例都将被替换，在任何subroutine子程序的XML文件中也是如此。请注意，虽然目前script_vars是作为纯粹的宏文本替换来实现的，但这在未来可能会发生变化，除了替换标签值之外，任何使用都可能无法进行。特别是，任何使用脚本变量来改变XML文件本身的解析结构的做法都是明确的*不支持*的，你甚至考虑这样做都是一种狡猾的想法。

3.2 包含XML文件

把常用的XML脚本放在自己的文件中，并指导脚本从预先存在的文件中加载一些XML代码，这样用户就不需要手动复制和粘贴XML代码了。XML xi:include命令可用于这一目的，用 "href=filename "来指定要包括的文件。

<xi:include href="(&filename_string)" />

xi:include块被天真地替换为用 "href=filename "指定的文件的内容。下面是一个使用xi:include的例子，我们假设用户经常使用AlaScan和Ddg过滤器，并希望把它们的设置放在一个单独的文件中，以便在他/她写一个新的RosettaScripts XML文件时可以包括在内：

file1.xml:

<ROSETTASCRIPTS>n <SCOREFXNS>n <ScoreFunction name="interface" weights="interface"/>n </SCOREFXNS>n <FILTERS>n <xi:include href="file2.xml"/>n <Sasa name="sasa" confidence="0"/>n </FILTERS>n <MOVERS>n <Docking name="dock" fullatom="1" local_refine="1" score_high="soft_rep"/>n </MOVERS>n <PROTOCOLS>n <Add mover_name="dock" filter_name="scan"/>n <Add filter_name="ddg"/>n <Add filter_name="sasa"/>n </PROTOCOLS>n <OUTPUT scorefxn="interface"/>n</ROSETTASCRIPTS>

file2.xml:

<AlaScan name="scan" partner1="1" partner2="1" scorefxn="interface" interface_distance_cutoff="10.0" repeats="5"/>n <Ddg name="ddg" confidence="0"/>

注意，文件的包含是递归发生的，所以包含的文件可以包含其他文件。循环依赖（例如，file1.xml包括file2.xml包括file3.xml包括file1.xml）是被禁止的，并将导致一个错误。然而，同一个文件的多重包含是允许的（尽管这很少是可取的）。以这种方式包含的文件数量是有限制的。递归限制是8，这个值可以通过使用-parser:inclusion_recursion_limit命令行选项来改变。在某些情况下，你可能希望阻止递归搜索（例如，如果被包含的文件非常大），可以在包含标签中使用一个可选的参数 "prevent_recursion "来实现这一目标，如下所示：

<xi:include href="(&filename_string)" prevent_recursion="True"/>

变量替换发生在文件包含之后，这意味着%%variable%%语句可以出现在包含的文件中；但是，这也意味着xi:includeblock不能包含%%variable%%语句。

4 预定义的RosettaScripts对象

为了方便起见，某些RosettaScripts对象可以不做定义标签就可以使用。

4.1 预定义的移动器

以下是在解析器内部定义的，协议可以使用它们而不需要明确地定义它们。

NullMover

有一个空的应用。如果没有指定mover_name，将作为<PROTOCOLS>中的默认移动器使用。可以明确指定，名称为 "null"。

4.2 预定义的过滤器

TrueFilter

总是返回true。对于不使用过滤器的情况下定义mover很有用。可以明确指定，名称为 "true_filter"。

FalseFilter

总是返回false。可以用 "false_filter "这个名字明确指定。

4.3 预定义的分数函数

talaris2014：Rosetta结构预测和设计使用的默认全原子得分函数。
talaris2013：talaris2014的先前版本
score12：talaris2013之前的默认得分函数（需要命令行中的-restore_pre_talaris_2013_behavior选项）。
score_docking: 高分辨率对接scorerefn（pre_talaris_2013_standard+docking_patch）。
score_docking_low：低分辨率的对接得分xn（interchain_cen）。
soft_rep：soft_rep_design权重。
score4L：用于loop建模的低分辨率得分函数（断链权重开启）。
commandline：由命令行options^[1]指定的scoreref函数（注意：不建议一般使用。）