MTZ2VARIOUS (CCP4: Supported Program)
NAME
mtz2various
- produces an ascii reflexion file for MULTAN, SHELX, TNT, X-PLOR/CNS, MAIN, CIF or user-defined format.
This may contain amplitudes, intensities or differences. SYNOPSIS
mtz2various hklin
foo_in.mtz
hklout
foo_out
[Key-worded input file]
DESCRIPTION
This reads an mtz file (assigned to HKLIN) and produces an ASCII file (assigned to
HKLOUT)in a suitable form
for MULTAN, SHELX, TNT, X-PLOR/CNS, MAIN or in a user-defined format.
For SHELX output all quantities are given as intensities, ie F and delF terms are
squared. An mmCIF file can also be produced with all the relevant
information taken from the MTZ header.
There are many options controlled by the assignments on the LABIN line.
The most common requirements are:
Generate a list of h k l F or h k l I. If anomalous data is present,
hkl and -h-k-l will be output on seperate lines.
If only FP, SIGFP or IP, SIGIP are assigned on LABIN, hkl FP SIGFP or hkl IP SIGIP is output.
If FP,SIGFP and DP, SIGDP are assigned, then F+ and F- are reconstructed,
and 2 reflections,, hkl and -h-k-l, are output (X-PLOR,SHELX and CIF formats only)
If F(+),SIGF(+) and F(-),SIGF(-) or I(+),SIGI(+) and I(-),SIGI(-) are assigned,
then again 2 reflections are output, hkl and -h-k-l .
If FP, SIGFP and FPH, SIGFPH are both assigned, then
hkl |FP-FPH| SIG|FP-FPH| is output (not applicable for USER and CIF). This
can be useful when solving heavy atom positions via direct methods.
If DP, SIGDP are assigned, and FP, SIGFP are NOT assigned, then
hkl |DP| SIGDP is output (not applicable for USER and CIF).This also can be used
to solve for anomalous scatterers using direct methods.
The same result can be obtained by assigning FP to FPH(+) and FPH to FPH(-).
Then hkl |F(+) -F(-)| SIG|F(+) -F(-)| is output.
There is no guarantee that the reflection count is completely robust.
Files sometimes have been slightly corrupted; eg DP not present but F(+) and F(-)
there. I have TRIED to make sensible decisions in ambigous cases.
When using OUTPUT USER you should get what you want, no tricks.
KEYWORDED INPUT
The allowed keywords are:
OUTPUT,
LABIN,
END,
FSQUARED,
MONITOR,
RESOLUTION,
SCALE,
INCLUDE,
EXCLUDE,
FREEVAL,
MISS
Compulsory input keywords are OUTPUT and LABIN.
OUTPUT [ MULTAN | SHELX | TNT | CIF | XPLOR | CNS | MAIN |
SCAL | USER ]
The output types are as follows:
- MULTAN
-
The output file has h, k, l, f, imt in FORMAT(3I4,7X,F7.0,I6), where
imt=0 for a good reflection.
- SHELX
-
The output file has the SHELX header followed by all
h, k, l, "I", sigma"I", 1 in FORMAT(3I4,2F8.2,I4).
Reflections previously excluded from refinement for FreeR analysis are
flagged with the word FREE at the end of the line. This means they can be
easily extracted from the SHELX file if desired.
NB: The SHELX programs expects intensities, so even if you assign input F terms the program will
automatically perform the conversion (see the FSQUARED
and SCALE keywords).
SHELX is usually used to find heavy atom sites. If FP and FPH are assigned, the program
calculates the Diso difference |FP - FPH| and outputs its squared value; |FP - FPH|^2
and an appropriate SIGMA.
If you wish to use anomalous differences as input, you can EITHER assign FP=FPH(+) and FPH as FPH(-),
which signals the programs to output |FPH(+) - FPH(-)|^2 squared, OR assign DP=DPH in which
case the program will output DPH^2.
- TNT
-
The output file has 'HKL ', h, k, l, F, sig(F), phase, fom in
format(A4,3I4,3F8.1,F8.4), with phase = 1000, fom = 0 i.e. dummies.
Note that files for TNT must be sorted on h, k, l and certain reflection
zones are required. You may need to run CAD to resort your data.
Use keywords
INCLUDE FREER <num> and EXCLUDE FREER <num> to generate files for
R-free calculation.
There is a maximum likelihood version of TNT from Pannu and Read
which requires a free-R flag (in Xplor convention). This column
will be output if you assign the FREE column in LABIN and do
not use the INCLUDE | EXCLUDE FREER options.
- CIF <data block header>
-
CIF output is invoked, where <data block header> is a maximum of 80 characters
long, and must begin with the characters "data_" (any mixture of upper and
lowercase thereafter). OUTPUT CIF can be used to prepare data (from crystallography
or EM) for deposition to the PDB.
Unlike the other output formats, all the reflections from HKLIN are written
to HKLOUT. Not all column labels are appropriate for CIF output (see Notes
on CIF). Also, only RESO, EXCLUDE SIGP and FREEVAL can be used with OUTPUT CIF.
They are used to flag certain reflections but not to reject them. The others
are ignored.
- XPLOR
-
The output file has FORMAT(A,3I5,A,F10.1,F10.1,A,F10.2,A,I6...). The exact
contents will depend on which labels have been specified by the
LABIN keyword. See the documentation for
FREERFLAG for a table explaining the differences in
free R flag conventions.
- CNS
-
Similar to XPLOR output. However, free R flags are left unchanged. To select
the correct free R flag in CNS, you will need something like:
{===>} test_flag=0;
- Anomalous data
-
For CIF, SHELX and XPLOR/CNS ONLY. If the anomalous difference is assigned (see
LABIN),
then the amplitudes for reflections h,k,l and -h,-k,-l are generated and
output as separate reflections. In this case, the column ISYM
may also be assigned if it is present: this is a flag from Truncate which = 0
if F comes from from both positive (hkl) and negative (-h-k-l) Bijvoet
reflections, = 1 if only from F+ and = 2 if only F-
- MAIN
-
This gives output suitable for the MAIN
program. The output file contains H K L FP SIGFP and optionally PHIB and FOM
if they are specified on the LABIN line. Alternatively, if FC is specified
on the LABIN line, then FP and FC are interpreted as the real and imaginary
parts respectively of a calculated F, and output as a "COMPLEX"
field.
- SCAL
-
This gives pseudo-SCALEPACK output which is needed as input to the SOLVE
package. The output file assigned to HKLOUT is ASCII and writes out
H K L I(+) SIGI(+) I(-) SIGI(-), with the format (3I4,4F8.1). The output
may need to be rescaled to fit this format.
- USER <format>
-
The output file is of the form H K L ? ? ... where the user can
specify which columns are to be output, how many and in what format.
Ten dummy labels (DUM??) are available to assign to any column and are output
as real. Also, there are ten dummy columns (IDUM??) which are output as
integer. The order of the data in the ASCII file are taken from the order of
the program labels specified on the LABIN card e.g.
LABIN FP=FP1 DP=DP1 SIGFP=SIG1 SIGDP=SIGDP1 would give the order
H K L FP1 DP1 SIG1 SIGDP1 in the output file. The format must either be of a
FORTRAN type with initially three integer items
and the rest must be complementary with the LABIN card e.g.
LABIN FP=FP DUM1=X IDUM1=Y
OUTPUT USER '(3I4,2F7.1,I4)'
or
OUTPUT USER *
to use free formatted output. However, all columns after H, K and L will be
treated as real numbers.
LABIN <program label>=<file label>
Input labels accepted are:
H, K, L Indices
FP, SIGFP F and Sigma for native
FPH, SIGFPH F and Sigma for derivative
FC, PHIC F and Phase from model
FPART, PHIPART F and Phase from partial structure
DP, SIGDP Anomalous difference and Sigma
I, SIGI I and Sigma
F(+), SIGF(+) F+ and Sigma(F+)
F(-), SIGF(-) F- and Sigma(F-) used for anomalous output
I(+), SIGI(+) I+ and Sigma(I+)
I(-), SIGI(-) I- and Sigma(I-)
FPART_BULK_S, PHIPART_BULK_S
Partial F and Phase for bulk solvent correction
W, FOM Weights
PHIB Best phase (experimental)
HLA,HLB,HLC,HLD Hendrickson-Lattman coefficients
FREE FreeR flag
ISYM (see TRUNCATE)
DUM?? Dummy labels (output as real)
IDUM?? Dummy labels (output as integer)
Not all columns are used in the various output formats, see
Notes on INPUT and OUTPUT. Also, the contents of the columns which are output
may depend on which input columns are assigned by LABIN, see DESCRIPTION
above.
Note: when using the DUM?? and IDUM?? labels, the program
may generate warnings about column type mismatches. This may happen for instance if
an anomalous difference (column type D) is assigned to one of the DUM labels
(which is nominally of type R, i.e. 'any other real'). These warnings should be ignored,
and the output is not affected.
END
End input.
FSQUARED
If this flag is set, the program expects F and SIGF and will output
I and SIGI: I = F*F, SIGI = 2*SIGF*F + SIGF*SIGF. These intensities are not
necessarily the same as the measured intensities (pre-TRUNCATE) it is better
to use the measured values if you have them.
MONITOR <Nmon>
followed by an integer <Nmon>.
Every <Nmon>-th reflection within the resolution range is monitored
(printed out).
RESOLUTION <resmin> <resmax>
Followed by 2 real numbers, <resmin>, <resmax>. This can be used to
restrict the output data to the given resolution range.
SCALE <scale>
The F and SIGF (or I and SIGI) are multiplied by <scale> before output.
This may be necessary if you are outputting
F_squared into the fixed SHELX format.
INCLUDE <keyword> <value> ...
Each secondary keyword is followed by a number setting the
appropriate limit for excluding data. Possible keywords are FREER.
- FREER <num>
-
Include only reflections with FreeRflag = <num>. This is different from the
FREEVAL keyword which specifies the freeR set. This will only be applicable
if you have assigned the FREE column.
EXCLUDE <keyword> <value> ...
Each secondary keyword is followed by a number setting the
appropriate limit for excluding data. Possible keywords are SIGP,
SIGH, DIFF, FPMAX, FPHMAX, FREER. If DP is assigned without FP
then the exclusion criterion for DIFF are applied to |DP|.
- SIGP <Nsig1>, SIGH <Nsig2>
-
Reflections are excluded if: FP<(<Nsig1>*SIGFP), FPH(<Nsig2>*SIGFPH).
Formerly MULTAN reflections were flagged and others unaffected
but now not output to any format.
- DIFF <difference_limit>
-
Reflections are excluded if |FP-FPH| (or |DP|) > <difference_limit>
- FPMAX <maximum>
-
Give <maximum> value for FP.
- FPHMAX <maximum>
-
Give <maximum> value for FPH
- FREER <num>
-
Omit reflections with FreeRflag = <num>. This is different from the
FREEVAL keyword which specifies the freeR set. This will only be applicable
if you have assigned the FREE column.
FREEVAL <num>
The reflections with FreeRflag = <num> are treated as the freeR set: the
default is 0 if FREE is assigned. This is important if you want to include
a free-R test in your XPLOR/CNS or SHELX refinement, or you are using the
Pannu-Read version of TNT. The FREE column must be assigned with LABIN.
MISS <valm>
By default, if any data associated with a reflection are missing,
i.e. are represented in HKLIN by a Missing Number Flag (MNF), then
that reflection will not appear in the output. However, if the keyword
MISS is given then these reflections will be output, but with
the MNFs converted to <valm>. The latter need not be given, and defaults
to 0.0. The other exclusions are still effective.
Also, if MISS is present then when producing isomorphous data, i.e. |FPH-FP|,
if either FPH or FP is a MNF then the difference is set to zero and the sigma
is twice the measured sigma. For example; FP=MNF SIGFP=MNF, FPH=100 SIGFPH=10
then FPH-FP = 0 and SIG=20.
Notes on INPUT and OUTPUT
Not all INPUT columns are accepted with a particular OUTPUT format. If one
has OUTPUT <subkw> then the allowed input columns are given below (see
LABIN and OUTPUT) :
- subkw = USER
-
accepts all input columns. Remember the format must match up with the
column assigments i.e. assigments to IDUM must be output as integers, all
others are treated as real. Warnings about mismatched column types when
using DUM or IDUM labels can be ignored; see LABIN
keyword.
- subkw = XPLOR [or CNS]
-
accepts all input columns except DUM1 to DUM10 and IDUM1 to IDUM10 and
I+, SIGI+, I- and SIGI-.
- subkw = SHELX
-
accepts columns H to SIGFPH and FREE, (DP SIGDP without FP).
- subkw = MULTAN
-
is like SHELX but will only use FREE to include or exclude reflections.
- subkw = TNT
-
is like SHELX except for the use of FREE: if the INCLUDE FREER or
EXCLUDE FREER keywords are specified then FREE is used to
include or exclude reflections, otherwise the FREE column (if assigned)
is output.
- subkw = MAIN
-
accepts H, K, L, FP, SIGFP, PHIB, FOM, FC
- subkw = CIF
-
only H, K, L, FP, SIGFP, DP, SIGDP, FC, PHIC, PHIB, FOM, I+, SIGI+, I-, SIGI-,
FPART_BULK_S, PHIPART_BULK_S and FREE are accepted.
You may still have trouble getting exactly the output you want. You can use
the unix utilities cut(1) or sed(1) to manipulate the mtz2various output.
Notes on CIF
All reflections in the MTZ input file will be output to the CIF
file. However, there are ways to flag certain reflections with the data type
_refln.status. Observed reflections will be flagged with 'o'. Unobserved
reflections, i.e. those flagged as missing, will be flagged as 'x'; these
reflections will not be added to _reflns.number_obs. The 'free' reflections
will be flagged as 'f'. The keyword FREEVAL can be used to indicate this set.
Systematically absent reflections are flagged with '-'.
If the RESO keyword is specified then reflections at higher or lower
resolution than the limits given, will be written with _refln.status 'h'
or 'l' respectively. The limits will be written to the CIF as the values of
_refine.ls_d_res_high and _refine.ls_d_res_low .
If EXCLUDE SIG is given then reflections for which F < <value>*sigma(F),
and which satisfy the resolution limits (if given), will be written with
_refln.status '<'. The value of _reflns.number_obs excludes all reflections
which do not satisfy the condition on sigma(F). All other sub-keywords of
EXCLUDE are ignored for CIF output.
N.B. The translation of the RESOLUTION and EXCLUDE SIGP conditions to
_refln.status values does not imply that the the use of these conditions is
good crystallographic practice. Be prepared to justify why you have excluded
any data from your final refinement!
If DP is assigned, anomalous mode is selected, and reflections for which DP
has been measured are written out as (hkl)/(-h-k-l) pairs. In this case,
if intensities are available, then both I+ and I- must be assigned, or a
warning will be printed, and the output CIF will not contain intensities.
Below is a list of the items output to the CIF file:
_entry.id
_audit.revision_id
_audit.creation_date
_audit.creation_method
_audit.update_record
_cell.entry_id
_cell.length_a
_cell.length_b
_cell.length_c
_cell.angle_alpha
_cell.angle_beta
_cell.angle_gamma
_symmetry.entry_id
_symmetry.Int_Tables_number
_symmetry.space_group_name_H-M
_symmetry_equiv.id
_symmetry_equiv.pos_as_xyz
_reflns.entry_id
_reflns.d_resolution_high
_reflns.d_resolution_low
_reflns.limit_h_max
_reflns.limit_h_min
_reflns.limit_k_max
_reflns.limit_k_min
_reflns.limit_l_max
_reflns.limit_l_min
_reflns.number_all
_reflns.number_obs
_diffrn_radiation_wavelength.id
_exptl_crystal.id
_reflns_scale.group_code
These items are the ones per reflection.
_refln.wavelength_id Always written
_refln.crystal_id Always written
_refln.scale_group_code Always written
_refln.index_h Always written
_refln.index_k Always written
_refln.index_l Always written
_refln.status Always written
_refln.F_meas_au FP
_refln.F_meas_sigma_au SIGFP
_refln.F_calc FC
_refln.phase_calc PHIC
_refln.phase_meas PHIB
_refln.fom FOM
_refln.intensity_meas I+
_refln.intensity_sigma SIGI+
_refln.ebi_F_xplor_bulk_solvent_calc FPART_BULK_S
_refln.ebi_phase_xplor_bulk_solvent_calc' PHIPART_BULK_S
mmCIF (at least at version 0.8) makes no provision for the output of
derivative data in the same data block as native data. For more information
about what these mmCIF categories are, check out the
mmCIF dictionary.
EXAMPLES
mtz2various HKLIN nicona HKLOUT dell.hkl << EOF
RESOLUTION 10000 2
OUTPUT XPLOR
EXCLUDE SIGP 0.01 # to exclude unmeasured refl.
LABIN FP=F SIGFP=SIGF FREE=FreeR_flag
END
EOF
mtz2various HKLOUT $CCP4_SCR/toxd.hkl hklin $CEXAM/toxd/toxd <<EOF
LABIN FP=FTOXD3 SIGFP=SIGFTOXD3
OUTPUT SHELX
RESOLUTION 100 4
END
EOF
A runnable unix example script is in $CEXAM/unix/runnable/
A non-runnable unix example script which demonstrates mtz2various used to output
anomalous data is in $CEXAM/unix/non-runnable/
SEE ALSO
mtzdump,
f2mtz,
cut(1), sed(1)
AUTHOR
Eleanor Dodson, York University