Skip to main content

NOSQL Databases.

NOSQL databases:

NoSQL databases are the one which comes in the picture when we no longer work with our traditional databases.

"NoSQL" : stand for [NOT ONLY SQL] is an replacement to traditional relational databases in big data world.
it is basically schema-free and scalable database, best suitable for solving big data challenges.

Properties of NOSQL databases.

  • Schema-free: A SQL database needs to be pre-defined with schema before we can add data to it. A NOSQL database on the other hand allows data insertion dynamically. i.e. we can insert data into NOSQL database without any predefined schema.
  • Scalable : Usually SQL databases scale-up vertically. i.e. when the load is high , the system will be individually upgraded. where as in NOSQL databases it is done horizontally, since these are distributed in nature.



 Types and Example of NOSQL databases:

Key-Value Stores:                ex: Redis and Amazon S3
Column-Value Stores:          ex: Hbase and Cassandra.
Document Databases:          ex: CouchBase and MongoDB.
Graph Databases:                 ex: Neo4j and MeshBase.




Comments

Popular posts from this blog

JPS in Hadoop

What is JPS in Hadoop? For checking running process in our Hadoop cluster we use JPS command. JPS  stands for Java  Virtual Machine Process Status Tool  or [JVM Process Status tool]. Below are some important points which one should remember at the time of using JPS command. To check all running nodes on the host via jps, you need to run the command as r oot. Otherwise, jps will only show nodes which you have currently logged-in user as. Example of JPS What-is-JPS-in-Hadoop By Prajjwal

How write operation done in HDFS?

How write operation done in HDFS?  HDFS follows Write once Read many model, s o we can't edit files which are already present in HDFS. Syntax to write data in HDFS:  hdfs dfs -put <local/file/path> <HDFS/location/where file needs to write>  exmple:         hdfs dfs -put /home/prajjwal/file1.txt  /landing_location/ To write a file in HDFS, a client needs to interact with master i.e. namenode (master). Namenode provides the address of the datanodes (slaves) on which client will start writing the data. Client can directly write data on the datanodes, now datanode will create data write pipeline. The first D ataNode will copy the block to another datanode, which intern copy it to the third datanode. Once it creates the replicas of blocks, it sends back the acknowledgment. We can understand with the help of below cartoon diagram. Thanks All.

How to remove duplicates from data?

Removing Duplicates from Production Data in Real-Time Using SQL Handling duplicates in production data requires efficient strategies to maintain data integrity and avoid system performance issues. Here’s a structured approach to achieve this:   --- ### **1. Prevention: Use Unique Constraints**   The best way to deal with duplicates is to prevent them. Ensure your database schema is designed to enforce uniqueness:   - **Primary Key**: Define a primary key to prevent identical rows.   - **Unique Constraints**: Apply unique constraints to columns or combinations of columns that should not contain duplicate values.   **Example:**   ```sql ALTER TABLE my_table  ADD CONSTRAINT unique_constraint_name UNIQUE (column1, column2); ``` --- ### **2. Identifying Duplicates**   Before removing duplicates, identify them using `GROUP BY` and `HAVING`:   **Example:**   ```sql SELECT column1, column2, COUNT(*) AS duplicate_count FROM my_table GROUP BY colu...