Hail on windows

I am currently using windows and wanted to use hail to run gwas and i downloaded python java and spark and set their paths as well and after importing hail, i wanted to initialize hail and got this error i have tried everything but failed. can anyone help please.

import hail as hl
hl.init()

error

Py4JJavaError: An error occurred while calling z:is.hail.backend.spark.SparkBackend.apply.
: java.lang.NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;
at is.hail.backend.spark.SparkBackend$.majorMinor$1(SparkBackend.scala:65)
at is.hail.backend.spark.SparkBackend$.checkSparkCompatibility(SparkBackend.scala:67)
at is.hail.backend.spark.SparkBackend$.createSparkConf(SparkBackend.scala:78)
at is.hail.backend.spark.SparkBackend$.configureAndCreateSparkContext(SparkBackend.scala:127)
at is.hail.backend.spark.SparkBackend$.apply(SparkBackend.scala:203)
at is.hail.backend.spark.SparkBackend.apply(SparkBackend.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source

Hi @Zaki ,

I’m sorry you’re having trouble. We don’t officially support Windows. I suspect many things will not work. I can try to help you get hl.init to work.

Are you using the Windows Subsystem for Linux? What is the output of this:

python --version
java -version
echo %SPARK_HOME%
echo %PYTHONPATH%
which python
which java
python -m pip show hail
python -m pip show pyspark

echo %PYTHONPATH%
%PYTHONPATH%

it was just returnig pytpath so I tried

where python
C:\Users\Zaki\anaconda3\python.exe

which python
/c/Users/Zaki/anaconda3/python

Thank you Dan, for the response.

python --version
Python 3.7.10

java -version
java version “1.8.0_291”
Java™ SE Runtime Environment (build 1.8.0_291-b09)
Java HotSpot™ 64-Bit Server VM (build 25.291-b09, mixed mode)

echo %SPARK_HOME%
C:\Spark\spark-3.1.1-bin-hadoop2.7

echo %PYTHONPATH%
%PYTHONPATH%

it was just returnig pytpath so I tried

where python
C:\Users\Zaki\anaconda3\python.exe

which python
/c/Users/Zaki/anaconda3/python

which java
/c/Program Files (x86)/Common Files/Oracle/Java/javapath/java

python -m pip show hail
Name: hail
Version: 0.2.64
Summary: Scalable library for exploring and analyzing genomic data.
Home-page: https://hail.is
Author: Hail Team
Author-email: hail@broadinstitute.org
License: UNKNOWN
Location: c:\users\zaki\anaconda3\lib\site-packages
Requires: parsimonious, aiohttp, decorator, asyncinit, requests, hurry.filesize, pandas, tqdm, aiohttp-session, google-cloud-storage, gcsfs, PyJWT, Deprecated, humanize, numpy, bokeh, dill, pyspark, python-json-logger, nest-asyncio, tabulate, scipy
Required-by:

python -m pip show pyspark
Name: pyspark
Version: 2.4.1
Summary: Apache Spark Python API
Home-page: spark/python at master · apache/spark · GitHub
Author: Spark Developers
Author-email: dev@spark.apache.org
License: http://www.apache.org/licenses/LICENSE-2.0
Location: c:\users\zaki\anaconda3\lib\site-packages
Requires: py4j
Required-by: hail

This is the problem. Your spark home points at spark 3.1.1 but your pyspark is 2.4.1. How did you install Hail? Can you unset SPARK_HOME?

Thank you very much Dan, Its working now.

Sorry to disturb you again but now I am facing another issue

hl.plot.output_notebook()
BokehJS 1.4.0 successfully loaded.

hl.utils.get_1kg(‘data/’)
Initializing Hail with default parameters… Running on Apache Spark version 2.4.1 SparkUI available at http://DESKTOP-LLOVFTK:4040 Welcome to __ __ <>__ / // /__ __/ / / __ / _ `/ / / // //_,/// version 0.2.64-1ef70187dc78 LOGGING: writing to C:\Users\Zaki\hail-20210331-1328-0.2.64-1ef70187dc78.log 2021-03-31 13:28:45 Hail: INFO: downloading 1KG VCF … Source: https://storage.googleapis.com/hail-tutorial/1kg.vcf.bgz

FatalError: IllegalArgumentException: Wrong FS: file://C:\Users\Zaki\AppData\Local\Temp\tmpq_djf_jt\1kg.vcf.bgz, expected: file:///

Java stack trace:
java.lang.IllegalArgumentException: Wrong FS: file://C:\Users\Zaki\AppData\Local\Temp\tmpq_djf_jt\1kg.vcf.bgz, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:649)
at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:82)
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:606)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.(ChecksumFileSystem.java:142)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769)
at is.hail.io.fs.HadoopFS.openNoCompression(HadoopFS.scala:83)
at is.hail.io.fs.FS$class.copy(FS.scala:188)
at is.hail.io.fs.HadoopFS.copy(HadoopFS.scala:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Unknown Source)

Hail version: 0.2.64-1ef70187dc78
Error summary: IllegalArgumentException: Wrong FS: file://C:\Users\Zaki\AppData\Local\Temp\tmpq_djf_jt\1kg.vcf.bgz, expected: file:///

I think the only way to make this work is to install and use Hail inside the Windows Subsystem for Linux

So, I installed ubuntu on my Virtual Machine and installed hail and other applications as java, g++ etc
and also set the paths for spark and java
when i run the command in jupyter notebook

import hail as hl
i get the following error

---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in
----> 1 import hail as hl

~/anaconda3/lib/python3.8/site-packages/hail/init.py in
** 32 # F401 ‘.expr.*’ imported but unused**
** 33 # E402 module level import not at top of file**
—> 34 from .table import Table, GroupedTable, asc, desc # noqa: E402
** 35 from .matrixtable import MatrixTable, GroupedMatrixTable # noqa: E402**
** 36 from .expr import * # noqa: F401,F403,E402**

~/anaconda3/lib/python3.8/site-packages/hail/table.py in
** 2 import itertools**
** 3 import pandas**
----> 4 import pyspark
** 5 from typing import Optional, Dict, Callable**
** 6 **

~/anaconda3/lib/python3.8/site-packages/pyspark/init.py in
** 49 **
** 50 from pyspark.conf import SparkConf**
—> 51 from pyspark.context import SparkContext
** 52 from pyspark.rdd import RDD, RDDBarrier**
** 53 from pyspark.files import SparkFiles**

~/anaconda3/lib/python3.8/site-packages/pyspark/context.py in
** 29 from py4j.protocol import Py4JError**
** 30 **
—> 31 from pyspark import accumulators
** 32 from pyspark.accumulators import Accumulator**
** 33 from pyspark.broadcast import Broadcast, BroadcastPickleRegistry**

~/anaconda3/lib/python3.8/site-packages/pyspark/accumulators.py in
** 95 import socketserver as SocketServer**
** 96 import threading**
—> 97 from pyspark.serializers import read_int, PickleSerializer
** 98 **
** 99 **

~/anaconda3/lib/python3.8/site-packages/pyspark/serializers.py in
** 69 xrange = range**
** 70 **
—> 71 from pyspark import cloudpickle
** 72 from pyspark.util import _exception_message**
** 73 **

~/anaconda3/lib/python3.8/site-packages/pyspark/cloudpickle.py in
** 143 **
** 144 **
→ 145 _cell_set_template_code = _make_cell_set_template_code()
** 146 **
** 147 **

~/anaconda3/lib/python3.8/site-packages/pyspark/cloudpickle.py in _make_cell_set_template_code()
** 124 )**
** 125 else:**
→ 126 return types.CodeType(
** 127 co.co_argcount,**
** 128 co.co_kwonlyargcount,**

TypeError: an integer is required (got type bytes)

Hey @Zaki , unfortunately, the currently released version of Hail does not support Python 3.8 because PySpark (a library on which we depend) does not support Python 3.8. Try creating a Python 3.7 environment using Anaconda.

Thank you @danking its working now but now when I run this command it shows me this error

hl.utils.get_1kg(’/data’)

021-04-05 21:34:58 Hail: INFO: downloading 1KG VCF …
** Source: https://storage.googleapis.com/hail-tutorial/1kg.vcf.bgz**
2021-04-05 21:35:00 Hail: INFO: importing VCF and writing to matrix table…
2021-04-05 21:35:07 Hail: INFO: Coerced sorted dataset

---------------------------------------------------------------------------
FatalError Traceback (most recent call last)
in
----> 1 hl.utils.get_1kg(’/data’)

~/anaconda3/lib/python3.7/site-packages/hail/utils/tutorial.py in get_1kg(output_dir, overwrite)
** 80 cluster_readable_vcf = _copy_to_tmp(fs, local_path_uri(tmp_vcf), extension=‘vcf.bgz’)**
** 81 info(‘importing VCF and writing to matrix table…’)**
—> 82 hl.import_vcf(cluster_readable_vcf, min_partitions=16).write(matrix_table_path, overwrite=True)
** 83 **
** 84 tmp_sample_annot = os.path.join(tmp_dir, ‘1kg_annotations.txt’)**

in write(self, output, overwrite, stage_locally, _codec_spec, _partitions)

~/anaconda3/lib/python3.7/site-packages/hail/typecheck/check.py in wrapper(__original_func, args, kwargs)
** 575 def wrapper(original_func, *args, kwargs):
** 576 args
, kwargs
= check_all(__original_func, args, kwargs, checkers, is_method=is_method)

→ 577 return original_func(*args, **kwargs)
** 578 **
** 579 return wrapper
*

~/anaconda3/lib/python3.7/site-packages/hail/matrixtable.py in write(self, output, overwrite, stage_locally, _codec_spec, _partitions)
** 2526 **
** 2527 writer = ir.MatrixNativeWriter(output, overwrite, stage_locally, _codec_spec, _partitions, _partitions_type)**
→ 2528 Env.backend().execute(ir.MatrixWrite(self._mir, writer))
** 2529 **
** 2530 class _Show:**

~/anaconda3/lib/python3.7/site-packages/hail/backend/py4j_backend.py in execute(self, ir, timed)
** 96 raise HailUserError(message_and_trace) from None**
** 97 **
—> 98 raise e

~/anaconda3/lib/python3.7/site-packages/hail/backend/py4j_backend.py in execute(self, ir, timed)
** 72 # print(self._hail_package.expr.ir.Pretty.apply(jir, True, -1))**
** 73 try:**
—> 74 result = json.loads(self._jhc.backend().executeJSON(jir))
** 75 value = ir.typ._from_json(result[‘value’])**
** 76 timings = result[‘timings’]**

~/anaconda3/lib/python3.7/site-packages/py4j/java_gateway.py in call(self, args)
** 1255 answer = self.gateway_client.send_command(command)
*
** 1256 return_value = get_return_value(**
→ 1257 answer, self.gateway_client, self.target_id, self.name)
** 1258 **
** 1259 for temp_arg in temp_args:**

~/anaconda3/lib/python3.7/site-packages/hail/backend/py4j_backend.py in deco(args, kwargs)
** 30 raise FatalError(’%s\n\nJava stack trace:\n%s\n’

** 31 ‘Hail version: %s\n’
*
—> 32 ‘Error summary: %s’ % (deepest, full, hail.version, deepest), error_id) from None
** 33 except pyspark.sql.utils.CapturedException as e:**
** 34 raise FatalError(’%s\n\nJava stack trace:\n%s\n’**

FatalError: IOException: Mkdirs failed to create /data/1kg.mt/rows/rows/parts (exists=false, cwd=file:/home/zaki)

Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 16, localhost, executor driver): java.io.IOException: Mkdirs failed to create /data/1kg.mt/rows/rows/parts (exists=false, cwd=file:/home/zaki)
** at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455)**
** at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)**
** at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)**
** at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)**
** at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)**
** at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)**
** at is.hail.io.fs.HadoopFS.createNoCompression(HadoopFS.scala:74)**
** at is.hail.io.fs.FS$class.create(FS.scala:151)**
** at is.hail.io.fs.HadoopFS.create(HadoopFS.scala:70)**
** at is.hail.io.RichContextRDDRegionValue$.writeSplitRegion(RichContextRDDRegionValue.scala:106)**
** at is.hail.rvd.RVD$$anonfun$29.apply(RVD.scala:938)**
** at is.hail.rvd.RVD$$anonfun$29.apply(RVD.scala:936)**
** at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$18.apply(ContextRDD.scala:259)**
** at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$18.apply(ContextRDD.scala:259)**
** at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anonfun$2.apply(RichContextRDD.scala:62)**
** at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anonfun$2.apply(RichContextRDD.scala:62)**
** at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)**
** at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)**
** at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anon$1.hasNext(RichContextRDD.scala:71)**
** at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)**
** at scala.collection.Iterator$class.foreach(Iterator.scala:891)**
** at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)**
** at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)**
** at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)**
** at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)**
** at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)**
** at scala.collection.AbstractIterator.to(Iterator.scala:1334)**
** at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)**
** at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)**
** at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)**
** at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)**
** at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)**
** at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)**
** at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)**
** at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)**
** at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)**
** at org.apache.spark.scheduler.Task.run(Task.scala:121)**
** at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)**
** at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)**
** at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)**
** at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)**
** at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)**
** at java.lang.Thread.run(Thread.java:748)**

Driver stacktrace:
** at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)**
** at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)**
** at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)**
** at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)**
** at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)**
** at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)**
** at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)**
** at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)**
** at scala.Option.foreach(Option.scala:257)**
** at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)**
** at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)**
** at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)**
** at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)**
** at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)**
** at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)**
** at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)**
** at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)**
** at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)**
** at org.apache.spark.SparkContext.runJob(SparkContext.scala:2126)**
** at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945)**
** at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)**
** at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)**
** at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)**
** at org.apache.spark.rdd.RDD.collect(RDD.scala:944)**
** at is.hail.sparkextras.ContextRDD.collect(ContextRDD.scala:176)**
** at is.hail.rvd.RVD.writeRowsSplit(RVD.scala:953)**
** at is.hail.expr.ir.MatrixValue.write(MatrixValue.scala:246)**
** at is.hail.expr.ir.MatrixNativeWriter.apply(MatrixWriter.scala:62)**
** at is.hail.expr.ir.WrappedMatrixWriter.apply(MatrixWriter.scala:41)**
** at is.hail.expr.ir.Interpret$.run(Interpret.scala:819)**
** at is.hail.expr.ir.Interpret$.alreadyLowered(Interpret.scala:53)**
** at is.hail.expr.ir.InterpretNonCompilable$.interpretAndCoerce$1(InterpretNonCompilable.scala:16)**
** at is.hail.expr.ir.InterpretNonCompilable$.is$hail$expr$ir$InterpretNonCompilable$$rewrite$1(InterpretNonCompilable.scala:53)**
** at is.hail.expr.ir.InterpretNonCompilable$.apply(InterpretNonCompilable.scala:58)**
** at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.transform(LoweringPass.scala:67)**
** at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)**
** at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3$$anonfun$1.apply(LoweringPass.scala:15)**
** at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)**
** at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:15)**
** at is.hail.expr.ir.lowering.LoweringPass$$anonfun$apply$3.apply(LoweringPass.scala:13)**
** at is.hail.utils.ExecutionTimer.time(ExecutionTimer.scala:81)**
** at is.hail.expr.ir.lowering.LoweringPass$class.apply(LoweringPass.scala:13)**
** at is.hail.expr.ir.lowering.InterpretNonCompilablePass$.apply(LoweringPass.scala:62)**
** at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:14)**
** at is.hail.expr.ir.lowering.LoweringPipeline$$anonfun$apply$1.apply(LoweringPipeline.scala:12)**
** at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)**
** at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)**
** at is.hail.expr.ir.lowering.LoweringPipeline.apply(LoweringPipeline.scala:12)**
** at is.hail.expr.ir.CompileAndEvaluate$._apply(CompileAndEvaluate.scala:28)**
** at is.hail.backend.spark.SparkBackend.is$hail$backend$spark$SparkBackend$$_execute(SparkBackend.scala:362)**
** at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:346)**
** at is.hail.backend.spark.SparkBackend$$anonfun$execute$1.apply(SparkBackend.scala:343)**
** at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1$$anonfun$apply$1.apply(ExecuteContext.scala:48)**
** at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1$$anonfun$apply$1.apply(ExecuteContext.scala:48)**
** at is.hail.utils.package$.using(package.scala:618)**
** at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:48)**
** at is.hail.expr.ir.ExecuteContext$$anonfun$scoped$1.apply(ExecuteContext.scala:47)**
** at is.hail.utils.package$.using(package.scala:618)**
** at is.hail.annotations.RegionPool$.scoped(RegionPool.scala:13)**
** at is.hail.expr.ir.ExecuteContext$.scoped(ExecuteContext.scala:47)**
** at is.hail.backend.spark.SparkBackend.withExecuteContext(SparkBackend.scala:256)**
** at is.hail.backend.spark.SparkBackend.execute(SparkBackend.scala:343)**
** at is.hail.backend.spark.SparkBackend$$anonfun$7.apply(SparkBackend.scala:387)**
** at is.hail.backend.spark.SparkBackend$$anonfun$7.apply(SparkBackend.scala:385)**
** at is.hail.utils.ExecutionTimer$.time(ExecutionTimer.scala:52)**
** at is.hail.backend.spark.SparkBackend.executeJSON(SparkBackend.scala:385)**
** at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)**
** at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)**
** at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)**
** at java.lang.reflect.Method.invoke(Method.java:498)**
** at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)**
** at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)**
** at py4j.Gateway.invoke(Gateway.java:282)**
** at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)**
** at py4j.commands.CallCommand.execute(CallCommand.java:79)**
** at py4j.GatewayConnection.run(GatewayConnection.java:238)**
** at java.lang.Thread.run(Thread.java:748)**

java.io.IOException: Mkdirs failed to create /data/1kg.mt/rows/rows/parts (exists=false, cwd=file:/home/zaki)
** at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:455)**
** at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:440)**
** at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:911)**
** at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:892)**
** at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:789)**
** at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:778)**
** at is.hail.io.fs.HadoopFS.createNoCompression(HadoopFS.scala:74)**
** at is.hail.io.fs.FS$class.create(FS.scala:151)**
** at is.hail.io.fs.HadoopFS.create(HadoopFS.scala:70)**
** at is.hail.io.RichContextRDDRegionValue$.writeSplitRegion(RichContextRDDRegionValue.scala:106)**
** at is.hail.rvd.RVD$$anonfun$29.apply(RVD.scala:938)**
** at is.hail.rvd.RVD$$anonfun$29.apply(RVD.scala:936)**
** at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$18.apply(ContextRDD.scala:259)**
** at is.hail.sparkextras.ContextRDD$$anonfun$cmapPartitionsWithIndex$1$$anonfun$apply$18.apply(ContextRDD.scala:259)**
** at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anonfun$2.apply(RichContextRDD.scala:62)**
** at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anonfun$2.apply(RichContextRDD.scala:62)**
** at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)**
** at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)**
** at is.hail.utils.richUtils.RichContextRDD$$anonfun$cleanupRegions$1$$anon$1.hasNext(RichContextRDD.scala:71)**
** at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)**
** at scala.collection.Iterator$class.foreach(Iterator.scala:891)**
** at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)**
** at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)**
** at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)**
** at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)**
** at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)**
** at scala.collection.AbstractIterator.to(Iterator.scala:1334)**
** at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)**
** at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1334)**
** at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)**
** at scala.collection.AbstractIterator.toArray(Iterator.scala:1334)**
** at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)**
** at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:945)**
** at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)**
** at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)**
** at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)**
** at org.apache.spark.scheduler.Task.run(Task.scala:121)**
** at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:403)**
** at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)**
** at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)**
** at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)**
** at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)**
** at java.lang.Thread.run(Thread.java:748)**

Hail version: 0.2.64-1ef70187dc78
Error summary: IOException: Mkdirs failed to create /data/1kg.mt/rows/rows/parts (exists=false, cwd=file:/home/zaki)

You’re asking it to download the data to /data, meaning the root of your filesystem. In Windows terms that’s like trying to write to C:\data. Normal users aren’t allowed to do that. Note that the tutorial downloads the data to data/.

So, what do you suggest i should do.

Try removing the “/“ in the command you ran:

hl.utils.get_1kg('data')

Thank you @danking its working perfectly now. :+1: