Suppose we have a lot of data in our Cloud Storage bucket and somehow by mistake someone runs
gsutil rm gs://my_bucket/*,
we will lose all our data and won’t be able to recover it easily or may never be able to recover it.
By Design, every storage object (file) in Cloud Storage is assigned 2 sequence numbers
we will talk about them in detail later,
In a Nutshell, a generation number will be assigned each time we replace an object or modify it. …
By default, Docker uses 172.17.0.0/16 subnet range.
If you want to work with Airflow and just starting up with your installation then Google Cloud Composer is the best solution, As it creates all the required services and manages Kubernetes Cluster via GKE and everything connects like magic.
But if you already have an On-prem Airflow or Airflow working on some other Cloud Provider and want to connect with GCP, you will have to do a couple of things to get everything up and running.
You might want to test this setup locally in your local Airflow before deploying your DAG in your deployed instance.
There are 2…
Few days back I was trying to work with Multiline JSONs (aka. JSON ) on Spark 2.1 and I faced a very peculiar issue while working on Single Line JSON(aka. JSONL or JSON Lines ) vs Multiline JSON files.
Consider an example, our JSON looks like below
here we can see we have 3 rows and all rows are enclosed inside an JSON array.
Now if we compare same data to be represented as JSON Lines, it would look something like
at every index(i,j) we have to choices either go right or down.
Clearly a optimal subproblem solution,
first try to implement using recursion, then easily convert to Top-down Dynamic programming.
findSum(i,j) = grid[i][j] + min( findSum(i,j+1),findSum(i+1,j));
In order to understand how your application runs on a cluster, an important thing to know about Dataset/Dataframe transformations is that they fall into two types, narrow and wide, which we will discuss first, before explaining the execution model.
Dataframe is nothing but a Dataset[Row], so going forward we will generally use Dataset.
As a review, transformations create a new Dataset from an existing one. Narrow transformations do not have to move data between partitions when creating a new Dataset from an existing one. …
$ vi filename
$ vi:e filename
Vim works on the principle of modality, so it has various modes,
You use git-bash a lot and want to access very application from there.(because you are unix(Linux/Mac) freak but your office has windows).
Like any other Unix environment (Linux/Mac), we can add our paths here in .bashrc or .profile file and get the access in bash after sourcing or reloading git-bash.
npm audit is a new feature, introduced with npm@6. It shows all vulnerabilities your dependencies got (excluding peerDependencies).
You can disable the warning for single package installations with the ‘--no-audit’ flag.
If you guys have used Github and have a long running project you might see something like this,
firstname.lastname@example.org was a big security vulnerability found which was resolved in email@example.com and later on.
this is a classic example where npm audit fix could be used efficiently.
we can use npm ls hoek here to find out where this dependency is used.
Easy monitoring of resources specially CPU core temperature and utilization of RAM or CPU cores could be very vital for health and maintenance of a system.
Sometimes, our system gets too hot or overloaded which could damage our system.
A good indicator for monitoring temperature, fan speeds and voltage for linux is psensor. It shows output of all sensors, draws graphs. Also selected outputs can be placed in indicator panel.
It can be installed by typing:
sudo apt-get install psensor
Newer versions of psensor can be installed from ppa:
sudo add-apt-repository ppa:jfi/ppa
sudo apt-get update
sudo apt-get install psensor
Big Data Engineer at Walmartlabs, loves Competitive programming, Big Data.