Using Object Versioning in Google Cloud Storage

Usecase

Suppose we have a lot of data in our Cloud Storage bucket and somehow by mistake someone runs

gsutil rm gs://my_bucket/*,

we will lose all our data and won’t be able to recover it easily or may never be able to recover it.

How does Object Versioning Help?

By Design, every storage object (file) in Cloud Storage is assigned 2 sequence numbers

we will talk about them in detail later,

In a Nutshell, a generation number will be assigned each time we replace an object or modify it. …


Network types in Docker

Bridge Network

By default, Docker uses 172.17.0.0/16 subnet range.


If you want to work with Airflow and just starting up with your installation then Google Cloud Composer is the best solution, As it creates all the required services and manages Kubernetes Cluster via GKE and everything connects like magic.

But if you already have an On-prem Airflow or Airflow working on some other Cloud Provider and want to connect with GCP, you will have to do a couple of things to get everything up and running.

Local Setup via Docker

You might want to test this setup locally in your local Airflow before deploying your DAG in your deployed instance.

There are 2…


Few days back I was trying to work with Multiline JSONs (aka. JSON ) on Spark 2.1 and I faced a very peculiar issue while working on Single Line JSON(aka. JSONL or JSON Lines ) vs Multiline JSON files.

JSON Lines vs JSON

Consider an example, our JSON looks like below
here we can see we have 3 rows and all rows are enclosed inside an JSON array.

JSON
JSON

Now if we compare same data to be represented as JSON Lines, it would look something like


Problem — https://leetcode.com/explore/challenge/card/30-day-leetcoding-challenge/530/week-3/3303/

Intuition

at every index(i,j) we have to choices either go right or down.

Clearly a optimal subproblem solution,

first try to implement using recursion, then easily convert to Top-down Dynamic programming.

findSum(i,j) = grid[i][j] + min( findSum(i,j+1),findSum(i+1,j));

Solution

solution

In order to understand how your application runs on a cluster, an important thing to know about Dataset/Dataframe transformations is that they fall into two types, narrow and wide, which we will discuss first, before explaining the execution model.

Dataframe is nothing but a Dataset[Row], so going forward we will generally use Dataset.

Narrow and Wide Transformations

As a review, transformations create a new Dataset from an existing one. Narrow transformations do not have to move data between partitions when creating a new Dataset from an existing one. …


Starting Editing in Vim

$ vi filename 
$ vi:e filename

Saving file

:w

Exiting Vim

:q 
:q!

Vim Modes

Vim works on the principle of modality, so it has various modes,


Problem Statement

You use git-bash a lot and want to access very application from there.(because you are unix(Linux/Mac) freak but your office has windows).

Solution — Adding Permanent path using .bashrc or .profile for git-bash

Like any other Unix environment (Linux/Mac), we can add our paths here in .bashrc or .profile file and get the access in bash after sourcing or reloading git-bash.

Steps


npm audit is a new feature, introduced with npm@6. It shows all vulnerabilities your dependencies got (excluding peerDependencies).

You can disable the warning for single package installations with the ‘--no-audit’ flag.

Why do we need this ???

If you guys have used Github and have a long running project you might see something like this,

hoek@2.16.3 was a big security vulnerability found which was resolved in hoek@4.2.1 and later on.

this is a classic example where npm audit fix could be used efficiently.

step 1 — find where this dependency is used

we can use npm ls hoek here to find out where this dependency is used.


Easy monitoring of resources specially CPU core temperature and utilization of RAM or CPU cores could be very vital for health and maintenance of a system.

Sometimes, our system gets too hot or overloaded which could damage our system.

A good indicator for monitoring temperature, fan speeds and voltage for linux is psensor. It shows output of all sensors, draws graphs. Also selected outputs can be placed in indicator panel.

It can be installed by typing:

sudo apt-get install psensor

Newer versions of psensor can be installed from ppa:

sudo add-apt-repository ppa:jfi/ppa
sudo apt-get update
sudo apt-get install psensor

It…

Ashish Patel

Big Data Engineer at Walmartlabs, loves Competitive programming, Big Data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store