Running test 2 5 === Created temporary folder : /tmp/script_Imz0jJML2J === Copying script to temporary folder === DONE === Executing Script + curl -XDELETE 'localhost:9200/bank?pretty' + curl -XDELETE 'localhost:9200/shakespeare?pretty' + curl -XDELETE 'localhost:9200/apache-logs-*?pretty' + curl -XDELETE 'localhost:9200/swiss-*?pretty' + mkdir -p /home/mes/input_data + cd /home/mes/input_data + [[ ! -f accounts.json ]] + set -e + curl -XPUT 'localhost:9200/bank?pretty' -H 'Content-Type: application/json' '-d { "settings": { "number_of_shards" : 1, "number_of_replicas" : 0 } } ' % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 163 100 80 100 83 263 273 --:--:-- --:--:-- --:--:-- 280 { "acknowledged" : true, "shards_acknowledged" : true, "index" : "bank" } + set +e + curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json + [[ 0 != 0 ]] + set +x Warning: Ignoring non-spark config property: es.nodes.data.only=false 17/09/21 15:02:47 INFO SparkContext: Running Spark version 2.2.0 17/09/21 15:02:48 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/09/21 15:02:48 INFO SparkContext: Submitted application: ESTest_2_5 17/09/21 15:02:48 INFO SecurityManager: Changing view acls to: mes 17/09/21 15:02:48 INFO SecurityManager: Changing modify acls to: mes 17/09/21 15:02:48 INFO SecurityManager: Changing view acls groups to: 17/09/21 15:02:48 INFO SecurityManager: Changing modify acls groups to: 17/09/21 15:02:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(mes); groups with view permissions: Set(); users with modify permissions: Set(mes); groups with modify permissions: Set() 17/09/21 15:02:49 INFO Utils: Successfully started service 'sparkDriver' on port 41423. 17/09/21 15:02:49 INFO SparkEnv: Registering MapOutputTracker 17/09/21 15:02:50 INFO SparkEnv: Registering BlockManagerMaster 17/09/21 15:02:50 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 17/09/21 15:02:50 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 17/09/21 15:02:50 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-3345ee72-e9d5-4915-9954-29ca1a209db6 17/09/21 15:02:50 INFO MemoryStore: MemoryStore started with capacity 246.9 MB 17/09/21 15:02:50 INFO SparkEnv: Registering OutputCommitCoordinator 17/09/21 15:02:50 INFO Utils: Successfully started service 'SparkUI' on port 4040. 17/09/21 15:02:50 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.10.10:4040 17/09/21 15:02:51 INFO SparkContext: Added file file:/tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py at spark://192.168.10.10:41423/files/2_collocation_5_bank_one_shard_repartition.py with timestamp 1506006171527 17/09/21 15:02:51 INFO Utils: Copying /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py to /tmp/spark-824ccf39-4434-43b7-b2c6-32d1efddac86/userFiles-c8ba4e5f-4a12-4271-aaa8-e9b213e790e1/2_collocation_5_bank_one_shard_repartition.py 2017-09-21 15:02:52,465:7590(0x7f00e2121700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8 2017-09-21 15:02:52,465:7590(0x7f00e2121700):ZOO_INFO@log_env@730: Client environment:host.name=mes_master 2017-09-21 15:02:52,465:7590(0x7f00e2121700):ZOO_INFO@log_env@737: Client environment:os.name=Linux 2017-09-21 15:02:52,465:7590(0x7f00e2121700):ZOO_INFO@log_env@738: Client environment:os.arch=4.9.0-3-amd64 2017-09-21 15:02:52,465:7590(0x7f00e2121700):ZOO_INFO@log_env@739: Client environment:os.version=#1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) 2017-09-21 15:02:52,465:7590(0x7f00e2121700):ZOO_INFO@log_env@747: Client environment:user.name=mes 2017-09-21 15:02:52,465:7590(0x7f00e2121700):ZOO_INFO@log_env@755: Client environment:user.home=/home/mes 2017-09-21 15:02:52,465:7590(0x7f00e2121700):ZOO_INFO@log_env@767: Client environment:user.dir=/tmp/script_Imz0jJML2J 2017-09-21 15:02:52,465:7590(0x7f00e2121700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=192.168.10.10:2181 sessionTimeout=10000 watcher=0x7f00ede6b712 sessionId=0 sessionPasswd= context=0x7f00fc001168 flags=0 I0921 15:02:52.468900 7703 sched.cpp:232] Version: 1.3.0 2017-09-21 15:02:52,485:7590(0x7f00df81b700):ZOO_INFO@check_events@1728: initiated connection to server [192.168.10.10:2181] 2017-09-21 15:02:52,514:7590(0x7f00df81b700):ZOO_INFO@check_events@1775: session establishment complete on server [192.168.10.10:2181], sessionId=0x15ea4e649aa000c, negotiated timeout=10000 I0921 15:02:52.516249 7696 group.cpp:340] Group process (zookeeper-group(1)@192.168.10.10:33679) connected to ZooKeeper I0921 15:02:52.516530 7696 group.cpp:830] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0) I0921 15:02:52.516568 7696 group.cpp:418] Trying to create path '/mesos' in ZooKeeper I0921 15:02:52.543126 7696 detector.cpp:152] Detected a new leader: (id='19') I0921 15:02:52.544447 7696 group.cpp:699] Trying to get '/mesos/json.info_0000000019' in ZooKeeper I0921 15:02:52.551262 7697 zookeeper.cpp:262] A new leading master (UPID=master@192.168.10.10:5050) is detected I0921 15:02:52.552189 7697 sched.cpp:336] New master detected at master@192.168.10.10:5050 I0921 15:02:52.556484 7697 sched.cpp:352] No credentials provided. Attempting to register without authentication I0921 15:02:52.582253 7695 sched.cpp:759] Framework registered with 074830c5-66d9-4eaf-b7cf-a2a021070856-0005 17/09/21 15:02:52 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43809. 17/09/21 15:02:52 INFO NettyBlockTransferService: Server created on 192.168.10.10:43809 17/09/21 15:02:52 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 17/09/21 15:02:52 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.10.10, 43809, None) 17/09/21 15:02:52 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.10.10:43809 with 246.9 MB RAM, BlockManagerId(driver, 192.168.10.10, 43809, None) 17/09/21 15:02:52 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.10.10, 43809, None) 17/09/21 15:02:52 INFO BlockManager: external shuffle service port = 7337 17/09/21 15:02:52 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.10.10, 43809, None) 17/09/21 15:02:53 INFO EventLoggingListener: Logging events to file:/var/lib/spark/eventlog/074830c5-66d9-4eaf-b7cf-a2a021070856-0005 17/09/21 15:02:53 INFO Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 17/09/21 15:02:53 INFO MesosCoarseGrainedSchedulerBackend: Capping the total amount of executors to 0 17/09/21 15:02:53 INFO MesosCoarseGrainedSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 17/09/21 15:02:54 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/tmp/script_Imz0jJML2J/spark-warehouse'). 17/09/21 15:02:54 INFO SharedState: Warehouse path is 'file:/tmp/script_Imz0jJML2J/spark-warehouse'. 17/09/21 15:02:56 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 17/09/21 15:02:56 INFO Version: Elasticsearch Hadoop v6.0.0-beta2 [66f16fdd93] 17/09/21 15:03:00 INFO SparkSqlParser: Parsing command: gender='F' 17/09/21 15:03:03 INFO CodeGenerator: Code generated in 611.571735 ms 17/09/21 15:03:03 INFO ScalaEsRowRDD: Reading from [bank] 17/09/21 15:03:04 INFO SparkContext: Starting job: foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:25 17/09/21 15:03:04 INFO DAGScheduler: Got job 0 (foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:25) with 1 output partitions 17/09/21 15:03:04 INFO DAGScheduler: Final stage: ResultStage 0 (foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:25) 17/09/21 15:03:04 INFO DAGScheduler: Parents of final stage: List() 17/09/21 15:03:04 INFO DAGScheduler: Missing parents: List() 17/09/21 15:03:04 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[5] at foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:25), which has no missing parents 17/09/21 15:03:04 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 16.1 KB, free 246.9 MB) 17/09/21 15:03:04 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 7.4 KB, free 246.9 MB) 17/09/21 15:03:04 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.10.10:43809 (size: 7.4 KB, free: 246.9 MB) 17/09/21 15:03:04 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006 17/09/21 15:03:04 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (PythonRDD[5] at foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:25) (first 15 tasks are for partitions Vector(0)) 17/09/21 15:03:04 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 17/09/21 15:03:05 INFO MesosCoarseGrainedSchedulerBackend: Capping the total amount of executors to 1 17/09/21 15:03:05 INFO ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 17/09/21 15:03:10 WARN MesosCoarseGrainedSchedulerBackend: Unable to parse into a key:value label for the task. 17/09/21 15:03:10 INFO MesosCoarseGrainedSchedulerBackend: Mesos task 0 is now TASK_RUNNING 17/09/21 15:03:11 INFO TransportClientFactory: Successfully created connection to /192.168.10.12:7337 after 68 ms (0 ms spent in bootstraps) 17/09/21 15:03:11 INFO MesosExternalShuffleClient: Successfully registered app 074830c5-66d9-4eaf-b7cf-a2a021070856-0005 with external shuffle service. 17/09/21 15:03:16 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.10.12:38960) with ID 0 17/09/21 15:03:17 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.10.12:35633 with 366.3 MB RAM, BlockManagerId(0, 192.168.10.12, 35633, None) 17/09/21 15:03:17 INFO ExecutorAllocationManager: New executor 0 has registered (new total is 1) 17/09/21 15:03:17 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.10.12, executor 0, partition 0, NODE_LOCAL, 10194 bytes) 17/09/21 15:03:18 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.10.12:35633 (size: 7.4 KB, free: 366.3 MB) 17/09/21 15:03:23 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6193 ms on 192.168.10.12 (executor 0) (1/1) 17/09/21 15:03:23 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/09/21 15:03:23 INFO DAGScheduler: ResultStage 0 (foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:25) finished in 18.213 s 17/09/21 15:03:23 INFO DAGScheduler: Job 0 finished: foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:25, took 19.263490 s 17/09/21 15:03:23 INFO MesosCoarseGrainedSchedulerBackend: Capping the total amount of executors to 0 17/09/21 15:03:23 INFO ScalaEsRowRDD: Reading from [bank] 17/09/21 15:03:23 INFO SparkContext: Starting job: foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:32 17/09/21 15:03:23 INFO DAGScheduler: Got job 1 (foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:32) with 1 output partitions 17/09/21 15:03:23 INFO DAGScheduler: Final stage: ResultStage 1 (foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:32) 17/09/21 15:03:24 INFO DAGScheduler: Parents of final stage: List() 17/09/21 15:03:24 INFO DAGScheduler: Missing parents: List() 17/09/21 15:03:24 INFO DAGScheduler: Submitting ResultStage 1 (PythonRDD[12] at foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:32), which has no missing parents 17/09/21 15:03:24 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 16.4 KB, free 246.9 MB) 17/09/21 15:03:24 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 7.6 KB, free 246.9 MB) 17/09/21 15:03:24 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.10.10:43809 (size: 7.6 KB, free: 246.9 MB) 17/09/21 15:03:24 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006 17/09/21 15:03:24 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (PythonRDD[12] at foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:32) (first 15 tasks are for partitions Vector(0)) 17/09/21 15:03:24 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks 17/09/21 15:03:24 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 192.168.10.12, executor 0, partition 0, NODE_LOCAL, 10470 bytes) 17/09/21 15:03:24 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.10.12:35633 (size: 7.6 KB, free: 366.3 MB) 17/09/21 15:03:24 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 858 ms on 192.168.10.12 (executor 0) (1/1) 17/09/21 15:03:24 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 17/09/21 15:03:24 INFO DAGScheduler: ResultStage 1 (foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:32) finished in 0.863 s 17/09/21 15:03:24 INFO DAGScheduler: Job 1 finished: foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:32, took 0.989830 s 17/09/21 15:03:25 INFO ScalaEsRowRDD: Reading from [bank] 17/09/21 15:03:25 INFO SparkContext: Starting job: foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:39 17/09/21 15:03:25 INFO DAGScheduler: Registering RDD 16 (javaToPython at NativeMethodAccessorImpl.java:0) 17/09/21 15:03:25 INFO DAGScheduler: Got job 2 (foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:39) with 12 output partitions 17/09/21 15:03:25 INFO DAGScheduler: Final stage: ResultStage 3 (foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:39) 17/09/21 15:03:25 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 2) 17/09/21 15:03:25 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2) 17/09/21 15:03:25 INFO DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[16] at javaToPython at NativeMethodAccessorImpl.java:0), which has no missing parents 17/09/21 15:03:25 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 14.3 KB, free 246.8 MB) 17/09/21 15:03:25 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 6.1 KB, free 246.8 MB) 17/09/21 15:03:25 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.10.10:43809 (size: 6.1 KB, free: 246.9 MB) 17/09/21 15:03:25 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 17/09/21 15:03:25 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[16] at javaToPython at NativeMethodAccessorImpl.java:0) (first 15 tasks are for partitions Vector(0)) 17/09/21 15:03:25 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks 17/09/21 15:03:25 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, 192.168.10.12, executor 0, partition 0, NODE_LOCAL, 10183 bytes) 17/09/21 15:03:26 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.10.12:35633 (size: 6.1 KB, free: 366.3 MB) 17/09/21 15:03:27 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 1257 ms on 192.168.10.12 (executor 0) (1/1) 17/09/21 15:03:27 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 17/09/21 15:03:27 INFO DAGScheduler: ShuffleMapStage 2 (javaToPython at NativeMethodAccessorImpl.java:0) finished in 1.302 s 17/09/21 15:03:27 INFO DAGScheduler: looking for newly runnable stages 17/09/21 15:03:27 INFO DAGScheduler: running: Set() 17/09/21 15:03:27 INFO DAGScheduler: waiting: Set(ResultStage 3) 17/09/21 15:03:27 INFO DAGScheduler: failed: Set() 17/09/21 15:03:27 INFO DAGScheduler: Submitting ResultStage 3 (PythonRDD[20] at foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:39), which has no missing parents 17/09/21 15:03:27 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 8.6 KB, free 246.8 MB) 17/09/21 15:03:27 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 5.1 KB, free 246.8 MB) 17/09/21 15:03:27 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.10.10:43809 (size: 5.1 KB, free: 246.9 MB) 17/09/21 15:03:27 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006 17/09/21 15:03:27 INFO DAGScheduler: Submitting 12 missing tasks from ResultStage 3 (PythonRDD[20] at foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:39) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)) 17/09/21 15:03:27 INFO TaskSchedulerImpl: Adding task set 3.0 with 12 tasks 17/09/21 15:03:27 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, 192.168.10.12, executor 0, partition 0, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:27 INFO TaskSetManager: Starting task 1.0 in stage 3.0 (TID 4, 192.168.10.12, executor 0, partition 1, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:27 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.10.12:35633 (size: 5.1 KB, free: 366.3 MB) 17/09/21 15:03:27 INFO MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 192.168.10.12:38960 17/09/21 15:03:27 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 149 bytes 17/09/21 15:03:27 INFO TaskSetManager: Starting task 2.0 in stage 3.0 (TID 5, 192.168.10.12, executor 0, partition 2, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:27 INFO TaskSetManager: Starting task 3.0 in stage 3.0 (TID 6, 192.168.10.12, executor 0, partition 3, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:27 INFO TaskSetManager: Finished task 1.0 in stage 3.0 (TID 4) in 465 ms on 192.168.10.12 (executor 0) (1/12) 17/09/21 15:03:27 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 526 ms on 192.168.10.12 (executor 0) (2/12) 17/09/21 15:03:27 INFO TaskSetManager: Starting task 4.0 in stage 3.0 (TID 7, 192.168.10.12, executor 0, partition 4, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:27 INFO TaskSetManager: Finished task 2.0 in stage 3.0 (TID 5) in 204 ms on 192.168.10.12 (executor 0) (3/12) 17/09/21 15:03:28 INFO TaskSetManager: Starting task 5.0 in stage 3.0 (TID 8, 192.168.10.12, executor 0, partition 5, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:28 INFO TaskSetManager: Finished task 3.0 in stage 3.0 (TID 6) in 291 ms on 192.168.10.12 (executor 0) (4/12) 17/09/21 15:03:28 INFO TaskSetManager: Finished task 4.0 in stage 3.0 (TID 7) in 163 ms on 192.168.10.12 (executor 0) (5/12) 17/09/21 15:03:28 INFO TaskSetManager: Starting task 6.0 in stage 3.0 (TID 9, 192.168.10.12, executor 0, partition 6, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:28 INFO TaskSetManager: Starting task 7.0 in stage 3.0 (TID 10, 192.168.10.12, executor 0, partition 7, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:28 INFO TaskSetManager: Finished task 5.0 in stage 3.0 (TID 8) in 157 ms on 192.168.10.12 (executor 0) (6/12) 17/09/21 15:03:28 INFO TaskSetManager: Starting task 8.0 in stage 3.0 (TID 11, 192.168.10.12, executor 0, partition 8, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:28 INFO TaskSetManager: Finished task 6.0 in stage 3.0 (TID 9) in 194 ms on 192.168.10.12 (executor 0) (7/12) 17/09/21 15:03:28 INFO MesosCoarseGrainedSchedulerBackend: Capping the total amount of executors to 2 17/09/21 15:03:28 INFO ExecutorAllocationManager: Requesting 2 new executors because tasks are backlogged (new desired total will be 2) 17/09/21 15:03:28 INFO TaskSetManager: Starting task 9.0 in stage 3.0 (TID 12, 192.168.10.12, executor 0, partition 9, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:28 INFO TaskSetManager: Finished task 7.0 in stage 3.0 (TID 10) in 151 ms on 192.168.10.12 (executor 0) (8/12) 17/09/21 15:03:28 WARN MesosCoarseGrainedSchedulerBackend: Unable to parse into a key:value label for the task. 17/09/21 15:03:28 INFO TaskSetManager: Starting task 10.0 in stage 3.0 (TID 13, 192.168.10.12, executor 0, partition 10, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:28 INFO TaskSetManager: Finished task 8.0 in stage 3.0 (TID 11) in 200 ms on 192.168.10.12 (executor 0) (9/12) 17/09/21 15:03:28 INFO TaskSetManager: Finished task 9.0 in stage 3.0 (TID 12) in 182 ms on 192.168.10.12 (executor 0) (10/12) 17/09/21 15:03:28 INFO TaskSetManager: Starting task 11.0 in stage 3.0 (TID 14, 192.168.10.12, executor 0, partition 11, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:28 INFO TaskSetManager: Finished task 10.0 in stage 3.0 (TID 13) in 200 ms on 192.168.10.12 (executor 0) (11/12) 17/09/21 15:03:28 INFO MesosCoarseGrainedSchedulerBackend: Capping the total amount of executors to 1 17/09/21 15:03:28 INFO TaskSetManager: Finished task 11.0 in stage 3.0 (TID 14) in 196 ms on 192.168.10.12 (executor 0) (12/12) 17/09/21 15:03:28 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 17/09/21 15:03:28 INFO DAGScheduler: ResultStage 3 (foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:39) finished in 1.419 s 17/09/21 15:03:28 INFO DAGScheduler: Job 2 finished: foreachPartition at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:39, took 3.082629 s 17/09/21 15:03:28 INFO MesosCoarseGrainedSchedulerBackend: Capping the total amount of executors to 0 17/09/21 15:03:29 INFO SparkContext: Starting job: collect at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:42 17/09/21 15:03:29 INFO MesosCoarseGrainedSchedulerBackend: Mesos task 1 is now TASK_RUNNING 17/09/21 15:03:29 INFO DAGScheduler: Got job 3 (collect at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:42) with 12 output partitions 17/09/21 15:03:29 INFO DAGScheduler: Final stage: ResultStage 5 (collect at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:42) 17/09/21 15:03:29 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 4) 17/09/21 15:03:29 INFO DAGScheduler: Missing parents: List() 17/09/21 15:03:29 INFO DAGScheduler: Submitting ResultStage 5 (MapPartitionsRDD[21] at collect at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:42), which has no missing parents 17/09/21 15:03:29 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 3.8 KB, free 246.8 MB) 17/09/21 15:03:29 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2.2 KB, free 246.8 MB) 17/09/21 15:03:29 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.10.10:43809 (size: 2.2 KB, free: 246.9 MB) 17/09/21 15:03:29 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006 17/09/21 15:03:29 INFO DAGScheduler: Submitting 12 missing tasks from ResultStage 5 (MapPartitionsRDD[21] at collect at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:42) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)) 17/09/21 15:03:29 INFO TaskSchedulerImpl: Adding task set 5.0 with 12 tasks 17/09/21 15:03:29 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 15, 192.168.10.12, executor 0, partition 0, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:29 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 16, 192.168.10.12, executor 0, partition 1, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:29 INFO TransportClientFactory: Successfully created connection to /192.168.10.11:7337 after 117 ms (0 ms spent in bootstraps) 17/09/21 15:03:29 INFO MesosExternalShuffleClient: Successfully registered app 074830c5-66d9-4eaf-b7cf-a2a021070856-0005 with external shuffle service. 17/09/21 15:03:29 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.10.12:35633 (size: 2.2 KB, free: 366.3 MB) 17/09/21 15:03:29 INFO TaskSetManager: Starting task 2.0 in stage 5.0 (TID 17, 192.168.10.12, executor 0, partition 2, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:29 INFO TaskSetManager: Starting task 3.0 in stage 5.0 (TID 18, 192.168.10.12, executor 0, partition 3, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:29 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 16) in 580 ms on 192.168.10.12 (executor 0) (1/12) 17/09/21 15:03:29 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 15) in 593 ms on 192.168.10.12 (executor 0) (2/12) 17/09/21 15:03:29 INFO TaskSetManager: Starting task 4.0 in stage 5.0 (TID 19, 192.168.10.12, executor 0, partition 4, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:29 INFO TaskSetManager: Finished task 2.0 in stage 5.0 (TID 17) in 225 ms on 192.168.10.12 (executor 0) (3/12) 17/09/21 15:03:29 INFO TaskSetManager: Starting task 5.0 in stage 5.0 (TID 20, 192.168.10.12, executor 0, partition 5, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:29 INFO TaskSetManager: Finished task 3.0 in stage 5.0 (TID 18) in 250 ms on 192.168.10.12 (executor 0) (4/12) 17/09/21 15:03:30 INFO TaskSetManager: Starting task 6.0 in stage 5.0 (TID 21, 192.168.10.12, executor 0, partition 6, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:30 INFO TaskSetManager: Finished task 5.0 in stage 5.0 (TID 20) in 295 ms on 192.168.10.12 (executor 0) (5/12) 17/09/21 15:03:30 INFO TaskSetManager: Starting task 7.0 in stage 5.0 (TID 22, 192.168.10.12, executor 0, partition 7, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:30 INFO TaskSetManager: Finished task 4.0 in stage 5.0 (TID 19) in 350 ms on 192.168.10.12 (executor 0) (6/12) 17/09/21 15:03:30 INFO TaskSetManager: Starting task 8.0 in stage 5.0 (TID 23, 192.168.10.12, executor 0, partition 8, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:30 INFO TaskSetManager: Finished task 6.0 in stage 5.0 (TID 21) in 134 ms on 192.168.10.12 (executor 0) (7/12) 17/09/21 15:03:30 INFO TaskSetManager: Starting task 9.0 in stage 5.0 (TID 24, 192.168.10.12, executor 0, partition 9, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:30 INFO TaskSetManager: Starting task 10.0 in stage 5.0 (TID 25, 192.168.10.12, executor 0, partition 10, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:30 INFO TaskSetManager: Finished task 7.0 in stage 5.0 (TID 22) in 229 ms on 192.168.10.12 (executor 0) (8/12) 17/09/21 15:03:30 INFO TaskSetManager: Finished task 8.0 in stage 5.0 (TID 23) in 144 ms on 192.168.10.12 (executor 0) (9/12) 17/09/21 15:03:30 INFO TaskSetManager: Starting task 11.0 in stage 5.0 (TID 26, 192.168.10.12, executor 0, partition 11, NODE_LOCAL, 4748 bytes) 17/09/21 15:03:30 INFO TaskSetManager: Finished task 9.0 in stage 5.0 (TID 24) in 110 ms on 192.168.10.12 (executor 0) (10/12) 17/09/21 15:03:30 INFO TaskSetManager: Finished task 11.0 in stage 5.0 (TID 26) in 48 ms on 192.168.10.12 (executor 0) (11/12) 17/09/21 15:03:30 INFO TaskSetManager: Finished task 10.0 in stage 5.0 (TID 25) in 136 ms on 192.168.10.12 (executor 0) (12/12) 17/09/21 15:03:30 INFO DAGScheduler: ResultStage 5 (collect at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:42) finished in 1.460 s 17/09/21 15:03:30 INFO DAGScheduler: Job 3 finished: collect at /tmp/script_Imz0jJML2J/2_collocation_5_bank_one_shard_repartition.py:42, took 1.616514 s 17/09/21 15:03:30 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool Printing 10 first results Row(account_number=99, address=u'806 Rockwell Place', age=39, balance=47159, city=u'Shaft', email=u'ratliffheath@zappix.com', employer=u'Zappix', firstname=u'Ratliff', gender=u'F', lastname=u'Heath', state=u'ND') Row(account_number=190, address=u'636 Diamond Street', age=30, balance=3150, city=u'Crumpler', email=u'blakedavidson@quantasis.com', employer=u'Quantasis', firstname=u'Blake', gender=u'F', lastname=u'Davidson', state=u'KY') Row(account_number=347, address=u'784 Pulaski Street', age=24, balance=36038, city=u'Goochland', email=u'gouldcarson@mobildata.com', employer=u'Mobildata', firstname=u'Gould', gender=u'F', lastname=u'Carson', state=u'MI') Row(account_number=498, address=u'649 Columbia Place', age=39, balance=10516, city=u'Crenshaw', email=u'stellahinton@flyboyz.com', employer=u'Flyboyz', firstname=u'Stella', gender=u'F', lastname=u'Hinton', state=u'SC') Row(account_number=621, address=u'336 Kansas Place', age=26, balance=35480, city=u'Corriganville', email=u'lesliesloan@dancity.com', employer=u'Dancity', firstname=u'Leslie', gender=u'F', lastname=u'Sloan', state=u'AR') Row(account_number=760, address=u'440 Hubbard Place', age=37, balance=40996, city=u'Stockwell', email=u'rheablair@bicol.com', employer=u'Bicol', firstname=u'Rhea', gender=u'F', lastname=u'Blair', state=u'LA') Row(account_number=873, address=u'432 Lincoln Road', age=39, balance=43931, city=u'Bluetown', email=u'tishacotton@buzzmaker.com', employer=u'Buzzmaker', firstname=u'Tisha', gender=u'F', lastname=u'Cotton', state=u'GA') Row(account_number=14, address=u'661 Vista Place', age=39, balance=20480, city=u'Chamizal', email=u'ermakane@stockpost.com', employer=u'Stockpost', firstname=u'Erma', gender=u'F', lastname=u'Kane', state=u'NY') Row(account_number=115, address=u'537 Clara Street', age=31, balance=18750, city=u'Caron', email=u'nikkidoyle@fossiel.com', employer=u'Fossiel', firstname=u'Nikki', gender=u'F', lastname=u'Doyle', state=u'MS') Row(account_number=204, address=u'400 Waldane Court', age=39, balance=27714, city=u'Stollings', email=u'mavisdeleon@lotron.com', employer=u'Lotron', firstname=u'Mavis', gender=u'F', lastname=u'Deleon', state=u'LA') Fetched 493 women accounts (from collected list) 2 17/09/21 15:03:31 INFO SparkContext: Invoking stop() from shutdown hook 17/09/21 15:03:31 INFO SparkUI: Stopped Spark web UI at http://192.168.10.10:4040 17/09/21 15:03:31 INFO MesosCoarseGrainedSchedulerBackend: Shutting down all executors 17/09/21 15:03:31 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down 17/09/21 15:03:32 INFO MesosCoarseGrainedSchedulerBackend: Mesos task 0 is now TASK_FINISHED 17/09/21 15:03:34 INFO MesosCoarseGrainedSchedulerBackend: Mesos task 1 is now TASK_FAILED I0921 15:03:35.044803 7936 sched.cpp:2021] Asked to stop the driver I0921 15:03:35.053791 7699 sched.cpp:1203] Stopping framework 074830c5-66d9-4eaf-b7cf-a2a021070856-0005 17/09/21 15:03:35 INFO MesosCoarseGrainedSchedulerBackend: driver.run() returned with code DRIVER_STOPPED 17/09/21 15:03:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/09/21 15:03:35 INFO MemoryStore: MemoryStore cleared 17/09/21 15:03:35 INFO BlockManager: BlockManager stopped 17/09/21 15:03:35 INFO BlockManagerMaster: BlockManagerMaster stopped 17/09/21 15:03:35 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/09/21 15:03:35 INFO SparkContext: Successfully stopped SparkContext 17/09/21 15:03:35 INFO ShutdownHookManager: Shutdown hook called 17/09/21 15:03:35 INFO ShutdownHookManager: Deleting directory /tmp/spark-824ccf39-4434-43b7-b2c6-32d1efddac86 17/09/21 15:03:35 INFO ShutdownHookManager: Deleting directory /tmp/spark-824ccf39-4434-43b7-b2c6-32d1efddac86/pyspark-80324d32-1299-44c1-9cfc-445b1758f64f === DONE === Deleting temporary folder === DONE