Using Object Versioning in Google Cloud Storage


Suppose we have a lot of data in our Cloud Storage bucket and somehow by mistake someone runs

How does Object Versioning Help?

By Design, every storage object (file) in Cloud Storage is assigned 2 sequence numbers

  • generation number
  • meta-generation number

we will talk about them in detail later,

Network types in Docker
  • As we all know, By default Docker creates 3 networks automatically
    Bridge, Host, and None network.

Bridge Network

  • The private internal network created by default.
  • Every container is attached to this by default and gets an IP or range 172.17.*.*
  • Containers can also access each other using this IP if required.
  • For accessing internal IPs we need to map the port of the container to the docker host using the –p flag.

By default, Docker uses subnet range.

If you want to work with Airflow and just starting up with your installation then Google Cloud Composer is the best solution, As it creates all the required services and manages Kubernetes Cluster via GKE and everything connects like magic.

Local Setup via Docker

You might want to test this setup locally in your local Airflow before deploying your DAG in your deployed instance.

Few days back I was trying to work with Multiline JSONs (aka. JSON ) on Spark 2.1 and I faced a very peculiar issue while working on Single Line JSON(aka. JSONL or JSON Lines ) vs Multiline JSON files.

JSON Lines vs JSON

Consider an example, our JSON looks like below
here we can see we have 3 rows and all rows are enclosed inside an JSON array.


Now if we compare same data to be represented as JSON Lines, it would look something like

Problem —


at every index(i,j) we have to choices either go right or down.


In order to understand how your application runs on a cluster, an important thing to know about Dataset/Dataframe transformations is that they fall into two types, narrow and wide, which we will discuss first, before explaining the execution model.

Dataframe is nothing but a Dataset[Row], so going forward we will generally use Dataset.

Narrow and Wide Transformations

As a review, transformations create a new Dataset from an existing one. Narrow transformations do not have to move data between partitions when creating a new Dataset from an existing one. …

  • Vim is a hell of an editor, which has a very steep learning curve, but very efficient when you are done with it.

Starting Editing in Vim

  • editing from command-line
$ vi filename 
  • open a file inside vim, first entering vim using vi command then use :e command
$ vi:e filename

Saving file

  • run following command from normal mode.

Exiting Vim

  • exiting after writing file or you haven’t changed anything
  • exiting without saving changes, changes will be lost

Vim Modes

Vim works on the principle of modality, so it has various modes,

  • Insert mode — want to insert something
  • Normal/command mode — want to run come…

  • If you have been using git-bash for command line operations and couldn’t able to find some class paths this blog might help you in adding all class paths permanently to be used from git-bash.

Problem Statement

You use git-bash a lot and want to access very application from there.(because you are unix(Linux/Mac) freak but your office has windows).

Solution — Adding Permanent path using .bashrc or .profile for git-bash

Like any other Unix environment (Linux/Mac), we can add our paths here in .bashrc or .profile file and get the access in bash after sourcing or reloading git-bash.

  1. Go to Home in Bash(either use git-bash directly or type bash in cmd) by typing…

npm audit is a new feature, introduced with npm@6. It shows all vulnerabilities your dependencies got (excluding peerDependencies).

Why do we need this ???

If you guys have used Github and have a long running project you might see something like this,

hoek@2.16.3 was a big security vulnerability found which was resolved in hoek@4.2.1 and later on.

step 1 — find where this dependency is used

we can use npm ls hoek here to find out where this dependency is used.

Easy monitoring of resources specially CPU core temperature and utilization of RAM or CPU cores could be very vital for health and maintenance of a system.

It can be installed by typing:

sudo apt-get install psensor

Newer versions of psensor can be installed from ppa:

sudo add-apt-repository ppa:jfi/ppa
sudo apt-get update
sudo apt-get install psensor


Ashish Patel

Big Data Engineer at Walmartlabs, loves Competitive programming, Big Data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store