Global-Scale Apps Using Globally Distributed Autonomous Databases | Oracle DatabaseWorld AI Edition
Summary
TLDROracleのグローバル分散型自律データベースについて紹介するビデオスクリプトです。シールディングされたデータベースは、データのスケーラビリティ、生存性、データ主権のために世界中にデータを透過的に分散します。2017年にネイティブデータベースシールディングがリリースされ、現在では世界中の重要な分散アプリケーションを支えています。データ主権の透明性や超スケーラビリティを備えた単一の論理データベースとして動作し、データセンターやリージョン間でデータが再分散されることでアクティブアクティブアーキテクチャが可能になります。また、データ規制が増加する中でOracleのデータベースはその波を乗りこなすのに役立ち、分散データベースの複雑さを解消し、コストを削減します。
Takeaways
- 🌐 Oracleのグローバル分散型自律データベースは、データのスケール、生存性、データ主権のために世界中にデータを透過的に分散します。
- 📚 分散データベースとは、複数の物理的な場所にデータを格納するデータベースで、各場所がデータのサブセットを保管します。
- 🔧 Oracleデータベースは2017年にネイティブなデータベースシャーディングがリリースされた際に分散データベースとなり、現在では世界中の重要な分散アプリケーションを駆動しています。
- 💾 分散データベースの2つの主なユースケースは、ultimate scalability and survivability、透明なデータ主権です。
- 🗺️ Oracleグローバル分散データベースは単一の論理データベースであり、データはシャードと呼ばれる複数の物理的なデータベースに物理的に分散されています。
- 🔄 Oracleのデータベースはデータの再分散やシャード間のデータの自動再バランスをオンラインで実行し、ダウンタイムを発生させずにスケールアウトやスケールインに対応します。
- 🇮🇳 データ主権は多くの国で義務付けられており、たとえばインドの支払いデータの場合、_reserve Bank of Indiaのデータローカリゼーション規則により、支払いデータはインドに保管する必要があります。
- 🏦 アメリカ最大手の銀行は、規則に従うために支払いデータベースを再アーキテクチャ化し、Oracleのグローバル分散データベースを使用してインドの規則に簡単に対応しました。
- 🚀 BlueKaiはデジタルマーケティングキャンペーンのためのリーディングデータプラットフォームで、Oracleのグローバル分散データベース上で動作し、複数のデータセンターにわたって2.5ペタバイトのデータを分散しています。
- 🛡️ Oracleのグローバル分散データベースは、他の分散データベースよりも建築的に優れており、SQLをサポートする分散スケールアウトアーキテクチャを提供しています。
- 🤖 Oracle Globally Distributed Autonomous Databaseは自律管理を追加し、分散データベースの運用の複雑さを排除し、コストを削減します。
Q & A
Oracleのグローバル分散型自律データベースとは何ですか?
-Oracleのグローバル分散型自律データベースは、データが世界中の複数の場所に透過的に分散されることで、スケール、生存力、データの主権性を提供するデータベースです。
分散データベースの主なユースケースはどのようなものですか?
-分散データベースの主なユースケースは、究極のスケーラビリティと生存力、そして透過的なデータ主権性です。前者ではデータが分散・複製され、ハイスケールとフォールトトレランスを実現します。後者では各国のデータが国内に保管され、規制要件を実施するのに役立ちます。
Oracleデータベースはいつから分散データベースとなりましたか?
-Oracleデータベースは2017年にネイティブデータベースシャーディングがリリースされた時から分散データベースとなりました。
Oracleのグローバル分散型データベースが提供する主な機能は何ですか?
-Oracleのグローバル分散型データベースは、単一の論理データベースとして機能し、データが複数のシャードに物理的に分散されています。各シャードはデータの生存力を確保するための複製を持ち、アプリケーションリクエストを処理できます。アクティブアクティブアーキテクチャを可能にし、データの再配布がリアルタイムで行えます。
データ主権性規制がどのようにOracleのグローバル分散型データベースに影響を与えると述べていますか?
-データ主権性規制は、特定の国のデータがその国内に保管される必要があると定める規制です。これにより、一つのデータベースに世界各国のデータを保管するグローバルファイナンシャルサービス企業は、規則に従うためにデータベースを再アーキテクチャする必要が生じます。
BlueKaiという企業がどのような課題を抱えており、Oracleのデータベースはどのようにその課題に対処していますか?
-BlueKaiはデジタルマーケティングキャンペーンのためのリーディングデータプラットフォームで、リアルタイムアクセスが必要なハイスケールワークロードを持っています。Oracleのグローバル分散型データベースにより、以前のNoSQL製品から簡素化された新しいアーキテクチャに移行し、SQLの力でより迅速にイノベーションが行えるようになりました。
Oracleのデータベースはどのようにして分散データベースのパフォーマンスとスケーラビリティを確保していますか?
-Oracleは独自の分散スケールアウトアーキテクチャを持ち、NoSQLエンジンの上に他の分散データベースが動作するのとは異なり、SQLワークロードに適したパフォーマンスを提供しています。また、データの配布、複製、シャード配置の方法が他にないほど多様で柔軟性があります。
Oracleのグローバル分散型データベースにおけるデータの配布方法は何ですか?
-Oracleは複数のデータ配布方法をサポートしており、これには値ベースのデータ配布、システム管理データ配布、複合データ配布、ユーザー定義データ配布、重複データ配布、パーティションデータ配布が含まれます。これにより、データは最適な方法でシャード間で配布され、効率的に管理されます。
Oracleのデータベースにおけるデータの複製方法にはどのようなものがありますか?
-Oracleのデータベースは、リアルワールドネットワークの不安定性に対応するように設計された複数の複製方法を提供しています。これには同期複製、非同期複製、アダプティブ同期複製、ローカル同期とリモート非同期複製の組み合わせが含まれます。
Oracleのグローバル分散型自律データベースのデモではどのような操作が行われていますか?
-デモでは、Oracle Cloud Infrastructureコンソールからグローバル分散型自律データベースの作成フローを説明しており、データの配布方法、シャードの配置場所、シャードの形状とサイズ、レプリケーションの設定などを行うことができます。
Oracle Database 23aiではどのような新機能が追加される予定ですか?
-Oracle Database 23aiでは、Raftベースの複製による極度の生存力強化、AI ベクター検索と検索結果の強化などが予定されており、これにより分散データベースの機能がさらに向上する見込みです。
Outlines
🌐 Oracleの分散型データベースの紹介
Shailesh DwivediがOracleのグローバル分散型自律データベースについて説明。2017年にデータベースシャーディングがリリースされ、分散データベースとして機能するようになった。データは複数の物理的な場所に格納され、アプリケーションからはその分布は隠されており、自動的に適切な場所にルーティングされる。分散データベースはスケーラビリティと生存力、またはデータ主権のためにデータを複数のデータベースに分散し、複製する。また、国ごとのデータが国内に保管されることで規制要件を満たす。Oracleのグローバル分散データベースは単一の論理データベースとして機能し、データはシャードと呼ばれる複数のデータベースに物理的に分散される。データはシャード内で複製され、アクティブなシャードがアプリケーションリクエストを処理することができる。データは動的にシャード間で再分配でき、アプリケーションからはその再分配が隠されている。
🚀 BlueKaiのハイスケール分散データベースの活用
BlueKaiはデジタルマーケティングキャンペーンのためのリーディングデータプラットフォームで、リアルタイムアクセスが必要なハイスケールワークロードを担当。Oracleのグローバル分散データベースを利用して、104のコモディティサーバーで動作し、5,408のCPUコアと77.4TBのメモリを持つ。NoSQL製品から移行し、SQLのパワーと新しいアーキテクチャのシンプルさを活用して、より迅速にイノベーションを遂げる。データ総量は2.5PBで、複数のデータセンターに分散されており、各データセンター内のシャードには別のデータセンターにあるレプリカが存在し、データセンターの停止にも耐える。Oracleは他の分散データベースと比較して、アーキテクチャが優れており、SQLをフルサポートするスケールアウトクラスタを提供。
🛡️ Oracleの分散データベースの耐障害性と展開アーキテクチャ
Oracleの分散データベースは、アプリケーションのニーズに応じてレプリケーションを設定することができ、スケーラビリティや生存力ニーズを満たす。アクティブデータガードを通じてリドレベルのレプリケーションをサポートし、GoldenGateを通じてSQLレベルのレプリケーションも可能。シャード展開アーキテクチャも他と比べて多岐にわたり、独立サーバー、耐障害クラスター、Exadataスケールアウトクラスター、オンプレミス、クラウド、多云などに対応。Oracleのグローバル分散データベースは現代のアプリケーションに適しており、多様なデータ技術を単一のデータベースでサポート。
🔐 Oracle Globally Distributed Autonomous Databaseのデモンストレーションと新機能
Oracle Cloud Infrastructureコンソールからグローバル分散自律データベースの作成プロセスをデモンストレーション。データの配布方法、シャード数、場所、形態、サイズ、レプリケーションを設定し、数クリックでデータベースを作成可能。既にプロビジョニングされたデータベースを確認し、単一の論理エンドポイント経由でアプリケーションが接続することができる。自然言語クエリもサポートし、AIを活用して自然言語の質問をSQLに変換。今後のアップデートとして、Raftベースのレプリケーション、AI ベクター検索など、新機能が追加される予定。
Mindmap
Keywords
💡分散データベース
💡シャーディング
💡データ主権
💡アクティブアクティブアーキテクチャ
💡ハイパースケール
💡マルチテナント
💡クエリコーディネーター
💡データローカライズーション
💡ハイパースケールワークロード
💡オートノームデータベース
Highlights
Oracle's Globally Distributed Autonomous Database transparently distributes data across the globe for scale, survivability, and data sovereignty.
A distributed database stores data across multiple physical locations instead of one, with the physical distribution hidden from applications.
Oracle database introduced native database sharding in 2017, enabling it to become a distributed database.
Distributed databases serve two main use cases: ultimate scalability and survivability, and transparent data sovereignty for regulatory compliance.
Oracle's Globally Distributed Database is a single logical database with data physically distributed across multiple databases called shards.
Data in each shard is replicated for survivability, and all shards can process application requests, enabling an active-active architecture.
Data can be redistributed across shards, data centers, and regions while the database is up and running, with distribution hidden from applications.
Oracle's Globally Distributed Database allows for massively parallel analytics by parallelizing queries within and across shards.
Data sovereignty is becoming mandatory in many countries, impacting global financial services companies.
Oracle enabled a US bank to comply with India's data localization regulations by creating new shards in India with minimal application changes.
BlueKai, a leading data platform for digital marketing, runs on Oracle's Globally Distributed Database, processing 1 million transactions per second.
Oracle's architecture is ahead of others, with Oracle RAC designed from the start to deliver full SQL in a scale-out cluster.
Oracle supports more data distribution methods than any other distributed database, including value-based, system managed, composite, user-defined, duplicate, and partitioned distribution.
Oracle has more replication methods designed for real-world networks, including synchronous, asynchronous, adaptive synchronous, and a combination of local synchronous with remote asynchronous replication.
Oracle offers more shard deployment architectures, including independent servers, fault-tolerant scalable clusters, and across on-premises, in-cloud, and multiple clouds.
Oracle's Converged Database Architecture simplifies distributed databases by supporting all modern data types, workloads, and development styles in one database.
Oracle Globally Distributed Autonomous Database adds autonomous management to eliminate operational complexity and reduce cost.
Oracle brings natural language query to the Globally Distributed Database using autonomous database Select AI, translating natural language questions into SQL.
Upcoming features in Oracle Database 23ai include Raft-based replication for extreme survivability and AI Vector Search and Retrieval Augmented Generations.
Oracle's Globally Distributed Database is the most fully featured, providing more distribution, replication, and deployment methods, with autonomous capabilities that remove complexity and reduce cost.
Transcripts
Hello everyone. My name is Shailesh Dwivedi.
I'm responsible for product management
and cloud engineering
for Oracle's Globally Distributed Autonomous Database.
Oracle's Globally Distributed Autonomous Database
transparently distributes data across globe for scale,
survivability, and data sovereignty.
What is a distributed database?
It's a database that stores data across
multiple physical locations instead of one location.
Each location stores a subset of data.
The physical distribution
of data is hidden from the applications
and it transparently gets routed to the right location.
The Oracle database became a distributed database in 2017
when native database sharding was released.
Today, it powers many critical
distributed applications around the world.
There are two main use cases for distributed databases,
ultimate scalability and survivability,
where data is distributed
and replicated across databases for hyperscale
and fault tolerance.
The second use case is transparent data sovereignty,
where each country's data is stored in country
to help implement regulatory requirements.
An Oracle Globally Distributed Database
is a single logical database.
Data is physically distributed across multiple databases,
which are called shards.
Data in each shard is replicated for survivability.
All shards can process application requests.
This enables an active-active architecture.
Data can be redistributed across shards, data centers,
and regions while the database is up and running.
The distribution of data is hidden from the applications.
Application requests
that only need data from a single shard
are transparently sent directly to that shard.
Requests that need data from multiple shards
are automatically split
into multiple requests that are sent
to the appropriate shards
and committed atomically.
Queries can be parallelized both within and across shards
to implement massively parallel analytics.
This is the next generation
of big data done right.
Distributed databases distribute data
across multiple physical databases.
This allows data to be placed
in multiple geographic locations.
It's a wide single database
scalability and survivability limits
since databases are independent of each other.
Requests that need data from multiple shards
are automatically routed through a query coordinator.
Let's take a look at two customer examples.
Data sovereignty is becoming mandatory in many countries
like India for payment data.
The Reserve Bank of India data localization regulations
state that payment data must reside in India
if both the payer and the payee are Indian entities.
This is a show-stopper
for global financial services companies
that store data from all countries in a single database.
The key observations here are
that the data must be physically stored in India,
but it can be accessed from anywhere.
One of the largest US banks
had to rearchitect their payment database
to satisfy this regulation.
Initially, all worldwide payment data was stored
in the US on Exadata systems.
The application and the database tier were replicated
across regions inside the US for disaster recovery.
Oracle's Globally Distributed Database
enabled the bank to easily comply with India's regulations.
New shards were created in India to hold India data.
The new database architecture required
minimal application changes.
The highly complex application tier architecture
did not need to be redundantly deployed in India.
One thing is guaranteed,
data regulation will only increase.
In fact, a tidal wave of data regulations is coming
as more countries roll out their own regulation
and more industries become regulated.
Oracle's Globally Distributed Database implements features
that can help you surf this tidal wave
instead of being drowned by it.
Let's take a look at another customer example.
BlueKai is a leading data platform
for digital marketing campaigns.
The data is accessed in real time by hundreds of millions
of consumers as they surf the internet.
BlueKai is a hyperscale workload on a multi-petabyte
database that requires near instant response time.
It processes 1 million transactions per second,
and it invokes 30 billion APIs per day
with a response time of 1.6. milliseconds.
BlueKai now runs on Oracle's Globally Distributed Database.
It runs on 104 commodity servers,
which cumulatively have 5,408 CPU cores
and 77.4 terabytes of memory.
It was migrated from a combination
of NoSQL products like Aerospike, Cassandra, ScyllaDB.
The simplicity of new architecture
plus the power of SQL enables BlueKai
to innovate many times faster than before.
The total data volume is 2.5 petabytes
and is distributed across multiple data centers.
Each of the shards in a data center has a replica
in a different data center,
hence enabling BlueKai
to survive an outage of an entire data center.
Oracle's Globally Distributed Database
is architecturally ahead of others.
Oracle RAC was architected from day one
to deliver full SQL in a scale-out cluster.
Oracle has extended this mature parallel cluster SQL engine
to support distributed scale-out architecture.
Other distributed databases run on top of NoSQL engines.
Some are slowly adding SQL on top,
which would require decades of work.
NoSQL engines allowed them to get to market quickly,
but now they suffer from poor performance
since NoSQL engines are not designed for SQL workloads,
this is especially bad for reporting
and analytics kind of queries.
Oracle has more data distribution methods
than any other distributed database.
The data across multiple tables
is partitioned and is for a given sharding key
is co-located on one of the shard,.
For example, we have three tables, customer, order
and line items, and the data for Mary
across all three tables whose customer ID is 123
is co-located on shard number two.
Likewise, John's data, which is in green, across all tables,
is co-located on shard number three.
And Peter's data, which is in blue,
is co-located on shard number one.
This allows us to execute joins in a very optimal manner.
All the joins are satisfied locally on a given shard.
Additionally, all the constraints
are also satisfied on a given shard.
If we have certain tables which are more like
dimension table or reference tables in this case,
like a product catalog, the data for those tables
is automatically replicated across all shards.
Such tables are called duplicate tables.
Oracle supports value-based data distribution.
This distributes data by a value, for example,
a country code or a product ID
or distributes data by a range of values.
For example, ranges of phone numbers.
Oracle supports system managed data distribution.
It uses a consistent hash algorithm
to evenly distribute data across shards
or scalability and parallelism.
For example, to distribute data by customer ID,
device ID, or action ID.
Consistent hash enables online addition of shards
with minimal data movement.
Oracle also supports composite data distribution.
This is two levels of sharding
with two different sharding keys
and two different sharding methods.
Data is first distributed by value.
For example, a country code or a range,
for example, a phone number.
Then data is distributed evenly across data centers,
for example, using a customer ID
as a second sharding key,
using consistent hashing as a second sharding method.
Oracle also supports user-defined data distribution.
This is useful when data requires special handling,
such as skewed data.
For example, we can store Taylor Swift's
and Beatles data in their own shard
and combine smaller artist data together
in a separate shard.
Oracle supports duplicate data distribution.
This is useful for small tables that are read-only,
can be duplicated across all shards.
This is used to avoid cross-shard queries
and cross-shard referential integrated checks.
Oracle supports partitioned data distribution
within each shard.
For example, the data in any shard can further
be partitioned by data values such as date range,
and then further partitioned by another data value or hash.
This enables faster queries and joins within a shard.
Shards can also be added without incurring any downtime,
and the data automatically gets redistributed across shards.
This is to enable scale-out or scale-in for seasonality.
Data across shards is automatically rebalanced
with minimum data movement.
This is an online operation. It does not incur any downtime.
Oracle has more replication methods
than any other distributed database.
Oracle replication is designed for real-world networks.
Real-world distributed networks are flaky.
They incur long latencies, intermittent slowdowns,
and unpredictable stalls
that can wreck havoc on the application
response time and availability.
Oracle distributed database replication
can be configured to satisfy the needs of each application.
For example, can be configured
in a synchronous replication manner,
asynchronous replication manner,
adaptive synchronous replication,
or a combination of local synchronous
and remote asynchronous replication.
Oracle replication can meet any scaling
or survivability needs.
It supports a redo level replication using
Active Data Guard that provides fastest performance,
most comprehensive SQL functionality, readable replicas,
and provides simplest operation.
It also supports SQL level replication
using GoldenGate that provides fastest failover,
fully writeable replicas,
including conflict avoidance and resolution.
Oracle has more shard deployment architectures
than any other distributed database.
Oracle can be deployed on independent servers.
This is a simple low cost deployment option
that is very popular.
Shards can run on standalone commodity servers.
Oracle can deploy shards on fault-tolerant
scalable clusters as well.
Oracle can uniquely scale performance within a shard
or a region using an ultrafast Exadata scale-out cluster.
This enables data to be accessed within a region
without slow cross-shard access and coordination.
Failure or maintenance of servers
or databases in a cluster does not require
disruptive application failover to a replica.
Oracle can deploy shards across on premises, in cloud,
and multiple clouds.
You can choose the deployment option independently
for each shard or country.
This can be optimized for scaling needs
and cloud availability of each region.
Oracle can deploy shards within a fault-tolerant cluster.
This is useful for customers who want
the highest transparency and simplicity
and don't need the distributed data or sovereignty
or ultimate scale-out.
Sharding is used to route SQL with specific node
of a cluster for increased access locality
and hard spot avoidance.
Cross-shard SQL within a cluster is super fast
unlike the distributed shards.
Oracle's Globally Distributed Database
is designed for modern applications.
Modern applications use multiple data technologies.
For example, they use new types
of data like relational, JSON, spatial, graph,
new types of analytics like SQL, machine learning,
new workload types like AI vectors,
internet of things, blockchain.
One approach to deploying rich modern apps is
to use a specialized database for each application need.
The specialized database approach
makes data sovereignty nearly impossible.
Each database has a different data sovereignty architecture.
It has different APIs, different capabilities,
and a unique set of limitations.
Application developers and IT
strive to make every database comply
with data sovereignty regulation and makes all data flow
between database compliant.
Oracle's Converged Database Architecture
simplifies distributed databases.
It provides a complete
and a simple support for all modern data types, workloads,
and development styles in one database
with complete and simple consistency,
scalability, and availability.
It provides unified data sovereignty
for all data types and workloads.
Oracle recently released
Oracle Globally Distributed Autonomous Database.
It adds autonomous management
to eliminate the operational complexity
of a distributed database and reduces cost.
It's a combination of Globally Distributed Database
and Oracle Autonomous Database,
which enables a Globally Distributed Autonomous Database,
which is the simplest, most functional,
most mission critical
cloud-native distributed database service.
Now let's take a look at the demo
of Oracle Globally Distributed Autonomous Database.
I'm logged into Oracle Cloud infrastructure console.
From there, we'll quickly go through a flow
of creating a Globally Distributed Autonomous Database.
We provide a few options which we can leave
to default, we don't want to change.
Subsequently, we can configure how we want
to distribute the data, whether we want to use automated
or user-defined data distribution,
and we can provide the number
of shards we need in each of the locations.
Optionally, we can use a map to pick the locations
where we want to deploy shards.
For example, I want to deploy some shards in Phoenix,
some in London, some in Mumbai, some in Tokyo.
Then we can configure each of these shard locations
and specify how many shards we want in each location.
For example, three shards in Phoenix,
and I leave one in other regions.
We can then specify the shape
and the size for each of the shards
and whether we need a replica or not.
Here, I'm gonna select my shard one in Phoenix
I want to deploy in availability domain one.
We start with two ECPUs.
We want to enable auto-scaling.
This allows us to vertically scale each of the shards
as the workload increases
and conversely scale down if the workload decreases.
We want to configure replication
for the shard in a different region.
I want to keep my US data within US,
and I'll pick the US east,
which is Ashburn as my replica location,
and I want to put my replica in availability domain three.
Likewise, I can configure rest of the shapes and sizes
and replication for rest of the shards,
and with a few clicks of a button,
I will be able to create
a Globally Distributed Autonomous Database.
I'm gonna cancel out of this workflow and go back
and look at a Globally Distributed Autonomous Database
which has already been provisioned.
For example, this database spans multiple locations
including Hyderabad in India, Mumbai, Phoenix,
Ashburn, et cetera.
It's a single logical database
and application connects to it using
a single logical endpoint.
Additionally, if you want to use auto wallet encryption,
we can download the wallet and configure TLS.
We can also visualize all the shards across the globe
as a single logical database,
and we can look at some of the performance metrics.
Additionally, we have the ability to add a shard later,
start the entire Globally Distributed Database,
stop it, or terminate it, so on and so forth.
Let's go back and look at some other concepts.
Oracle brings natural language query
to Globally Distributed Database
using autonomous database Select AI
that can translate natural language questions into SQL
using an AI large learning model.
The SQL query is automatically routed
to the appropriate country
or shard by the Globally Distributed Database.
For example, if I want to ask a question,
how many total streams
for each Tom Cruise movies were viewed in India this month?
In this case, the large learning model generates the SQL
which already has country code as India,
and hence, Oracle's Globally Distributed Database
routes this query only to the shard in India.
There are some exciting features coming soon
in Oracle Database 23ai.
One of them is Raft-based replication
for extreme survivability.
This is a new replication method that uses
popular Raft quorum-based replication protocol.
This provides automatic failover
to a replica in under three seconds.
It implements an active-active symmetric configuration.
Each shard accepts writes and reads for a subset of data.
It also delivers zero data loss using a high performance
synchronous replication across shards.
AI Vector Search and Retrieval Augmented Generations,
are also available
with Oracle's Globally Distributed Database 23ai.
Oracle distributed database will add hyperscale
and data sovereignty to Oracle Database's
23ai Vector Search.
Customers will be able
to combine similarity search using AI vectors
with search on business data about customers and products
in a single distributed query.
Here are the key takeaways.
Oracle's Globally Distributed Database
is the most fully featured distributed database.
It provides more data distribution, replication,
and deployment methods than any other database.
Converged database architecture
makes data sovereignty easy for modern applications
that use multiple data types and workloads.
The new 23ai Raft-based replication
provides fast quorum-based failover.
It supports leading-edge AI such as Vector Search
and its autonomous capabilities remove complexity
and reduce cost.
Provides all the benefits
of a distributed database without the compromises,
so why settle for less?
Voir Plus de Vidéos Connexes
5.0 / 5 (0 votes)