How do I integrate Hadoop into spark

Microsoft SQL Server 2019 integrates Hadoop and Spark

Microsoft presented a first public review of the SQL Server 2019 at its own Ignite event. It is therefore clear that the group will not bring a SQL Server 2018 onto the market and will therefore skip a year. Most recently there were issues with the annual figures 2016 and 2017.

The most important innovations apparently occurred against the background that very few companies rely on just one database product. Microsoft has donated a number of new connectors to the next edition of its relational database, with which users can use the SQL Server to query other databases such as Oracle, Teradata and MongoDB or generic ODBC data sources and other SQL server instances.

SQL Server can do big data

But that's not all, the big data framework Apache Spark and the Hadoop Distributed File System (HDFS) will be integrated in SQL Server 2019. This shows that handling big data workloads has become more important to the database. The integration of Spark and Hadoop is obvious, as both are probably the most important frameworks in the big data environment. An article on ZDnet describes the scenario as follows: The nodes can run as SQL compute nodes and storage nodes or HDFS data nodes. In the case of HDFS, SQL Server and Apache Spark run together in the same container. The interoperability enables the container orchestration software Kubernetes, and the Kubernetes compatibility of SQL Server 2019 in turn allows the workloads to run in local environments or across the various public clouds.

Microsoft has also revised the storage engine and the PolyBase technology. The latter can execute queries on data stored in Hadoop or Azure Blob Storage. Support for Azure storage and both Cloudera and Hortonworks Hadoop clusters has also been expanded.

Existing features expanded

SQL Server 2019 also brings extended functions for the SQL Graph architecture introduced in SQL Server 2017. Direct execution of Java code is now also supported, using the same infrastructure that can execute R and Python code in the database and supports the product's machine learning services, which now run on SQL Server Linux instances run like you know with Windows instances.

Then the Azure SQL Database Managed Instance was announced as a finished product. It offers almost complete compatibility with the local SQL Server, but the server instances are managed by Microsoft. Microsoft has also presented a new service, Azure SQL Database Hyperscale, as a public preview, which should make working with large amounts of data (up to 100 terabytes) easier.

One last SQL Server news is the renaming of SQL Operations Studio to Azure Data Studio. As part of the new name, Microsoft wants to make the cross-platform front-end tool for SQL Server more modular so that it can work with data sources other than SQL Server. (ane)

Read comments (5) Go to homepage
Ad ad