@pwc2, that’s my current hail’s version.
import hail as hl
hl.init()
Running on Apache Spark version 3.1.2
SparkUI available at http://sphinx:4040
Welcome to
__ __ <>__
/ /_/ /__ __/ /
/ __ / _ `/ / /
/_/ /_/\_,_/_/_/ version 0.2.82-2ab242915c2c
LOGGING: writing to /mnt/tank/scratch/rskitchenko/projects/haloplex_notch/hail-20220209-2224-0.2.82-2ab242915c2c.log
My data info
mt.count()
2022-02-09 22:28:57 Hail: INFO: reading 2 of 4 data partitions
(2, 90)
mt.describe()
----------------------------------------
Global fields:
None
----------------------------------------
Column fields:
's': str
'sample_qc': struct {
dp_stats: struct {
mean: float64,
stdev: float64,
min: float64,
max: float64
},
gq_stats: struct {
mean: float64,
stdev: float64,
min: float64,
max: float64
},
call_rate: float64,
n_called: int64,
n_not_called: int64,
n_filtered: int64,
n_hom_ref: int64,
n_het: int64,
n_hom_var: int64,
n_non_ref: int64,
n_singleton: int64,
n_snp: int64,
n_insertion: int64,
n_deletion: int64,
n_transition: int64,
n_transversion: int64,
n_star: int64,
r_ti_tv: float64,
r_het_hom_var: float64,
r_insertion_deletion: float64
}
----------------------------------------
Row fields:
'locus': locus<GRCh38>
'alleles': array<str>
'rsid': str
'qual': float64
'filters': set<str>
'info': struct {
AC: array<int32>,
AF: array<float64>,
AN: int32,
BaseQRankSum: float64,
DP: int32,
END: int32,
ExcessHet: float64,
FS: float64,
InbreedingCoeff: float64,
MLEAC: array<int32>,
MLEAF: array<float64>,
MQ: float64,
MQRankSum: float64,
QD: float64,
RAW_MQandDP: array<int32>,
ReadPosRankSum: float64,
SOR: float64,
CSQ: array<str>
}
'a_index': int32
'was_split': bool
'variant_qc': struct {
dp_stats: struct {
mean: float64,
stdev: float64,
min: float64,
max: float64
},
gq_stats: struct {
mean: float64,
stdev: float64,
min: float64,
max: float64
},
AC: array<int32>,
AF: array<float64>,
AN: int32,
homozygote_count: array<int32>,
call_rate: float64,
n_called: int64,
n_not_called: int64,
n_filtered: int64,
n_het: int64,
n_non_ref: int64,
het_freq_hwe: float64,
p_value_hwe: float64,
p_value_excess_het: float64
}
'kept_transcripts': array<str>
'hom_ref': int64
'hom_var': int64
'het': int64
'gene': str
'consequence': str
----------------------------------------
Entry fields:
'AD': array<int32>
'DP': int32
'GQ': int32
'GT': call
'MIN_DP': int32
'PGT': call
'PID': str
'PL': array<int32>
'PS': int32
'RGQ': int32
'SB': array<int32>
'n_alt_alleles': int32
----------------------------------------
Column key: ['s']
Row key: ['locus', 'alleles']
----------------------------------------
mt.entries().select('n_alt_alleles').count()
2022-02-09 22:35:19 Hail: WARN: entries(): Resulting entries table is sorted by '(row_key, col_key)'.
To preserve row-major matrix table order, first unkey columns with 'key_cols_by()'
2022-02-09 22:35:24 Hail: INFO: reading 2 of 4 data partitions
152