-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
331 lines (197 loc) · 15.1 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Hexo</title>
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta property="og:type" content="website">
<meta property="og:title" content="Hexo">
<meta property="og:url" content="http://example.com/index.html">
<meta property="og:site_name" content="Hexo">
<meta property="og:locale" content="en_US">
<meta property="article:author" content="John Doe">
<meta name="twitter:card" content="summary">
<link rel="alternate" href="/atom.xml" title="Hexo" type="application/atom+xml">
<link rel="shortcut icon" href="/favicon.png">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/index.min.css">
<link rel="stylesheet" href="/css/style.css">
<link rel="stylesheet" href="/fancybox/jquery.fancybox.min.css">
<meta name="generator" content="Hexo 6.3.0"></head>
<body>
<div id="container">
<div id="wrap">
<header id="header">
<div id="banner"></div>
<div id="header-outer" class="outer">
<div id="header-title" class="inner">
<h1 id="logo-wrap">
<a href="/" id="logo">Hexo</a>
</h1>
</div>
<div id="header-inner" class="inner">
<nav id="main-nav">
<a id="main-nav-toggle" class="nav-icon"><span class="fa fa-bars"></span></a>
<a class="main-nav-link" href="/">Home</a>
<a class="main-nav-link" href="/archives">Archives</a>
</nav>
<nav id="sub-nav">
<a class="nav-icon" href="/atom.xml" title="RSS Feed"><span class="fa fa-rss"></span></a>
<a class="nav-icon nav-search-btn" title="Search"><span class="fa fa-search"></span></a>
</nav>
<div id="search-form-wrap">
<form action="//google.com/search" method="get" accept-charset="UTF-8" class="search-form"><input type="search" name="q" class="search-form-input" placeholder="Search"><button type="submit" class="search-form-submit"></button><input type="hidden" name="sitesearch" value="http://example.com"></form>
</div>
</div>
</div>
</header>
<div class="outer">
<section id="main">
<article id="post-GremlinParallelization" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/2023/11/03/GremlinParallelization/" class="article-date">
<time class="dt-published" datetime="2023-11-03T09:02:05.000Z" itemprop="datePublished">2023-11-03</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/2023/11/03/GremlinParallelization/">GremlinParallelization</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<h1 id="Gremlin-算子并行化实现论文"><a href="#Gremlin-算子并行化实现论文" class="headerlink" title="Gremlin 算子并行化实现论文"></a>Gremlin 算子并行化实现论文</h1><p><a target="_blank" rel="noopener" href="https://www.usenix.org/system/files/nsdi21-qian.pdf">Gremlin-Parallelization-Paper</a> </p>
<h2 id="0-论文的结构:「是什么,为什么,怎么做」"><a href="#0-论文的结构:「是什么,为什么,怎么做」" class="headerlink" title="0 论文的结构:「是什么,为什么,怎么做」"></a>0 论文的结构:「是什么,为什么,怎么做」</h2><p>目标:高性能,易用的分布式OLAP图分析系统</p>
<h2 id="0-摘要"><a href="#0-摘要" class="headerlink" title="0 摘要"></a>0 摘要</h2><p> 这篇论文讲述的系统主要是一个支持 Gremlin 查询的分布式 OLAP 图分析系统,此系统已经在Ali 生产环境中应用</p>
<h2 id="1-介绍"><a href="#1-介绍" class="headerlink" title="1. 介绍"></a>1. 介绍</h2><p> 主要讲解了解决大图交互式查询的性能问题,提出了核心技术:<br> 1. DataFlow graph、<br> 2. bounded-memory executionand<br> 3. early-stop optimization<br> 这几个核心概念解决性能以及内存使用问题</p>
<h2 id="2-系统架构"><a href="#2-系统架构" class="headerlink" title="2.系统架构"></a>2.系统架构</h2><p>架构图<br>本文主要实现了分布式算子执行,为了用户方便使用,集成了Gremlin Server</p>
<h2 id="3-编程模型-GAIA"><a href="#3-编程模型-GAIA" class="headerlink" title="3. 编程模型 GAIA"></a>3. 编程模型 GAIA</h2><p> 重点介绍数据模型 + 核心概念<br> DataModel: Property Graph<br> Processing: Traversal 集成各种算子,只不过此处的算子在工程实现上模仿了分布式实现</p>
<p> 第四五章是论文重点,讲解核心概念以及详细设计</p>
<h2 id="4-Gremlin编译汇编-【Compilation-of-Gremlin】"><a href="#4-Gremlin编译汇编-【Compilation-of-Gremlin】" class="headerlink" title="4. Gremlin编译汇编 【Compilation of Gremlin】"></a>4. Gremlin编译汇编 【Compilation of Gremlin】</h2><p> 核心实现概念 Data DataFlow Graph<br> 有个横向对比针对Q2<br> Q2: g.V(2).out().out().count()<br> gremlin-core 实现的执行算子<br> GraphScope GAIA 实现的执行计划</p>
<p>结合图模型 + 执行计划 讲解了 Dataflow graph and execution for query Q2.【附图】</p>
<p>Data Flow执行中核心概念 Enter, Exit, and GoTo</p>
<p>通过几个Query Case 没有理解分布式执行的理念: 需要进一步结合代码 debug 下看看</p>
<p>第五章讲解具体工程实现思路?</p>
<h2 id="5-分布式执行【Distributed-Execution】"><a href="#5-分布式执行【Distributed-Execution】" class="headerlink" title="5. 分布式执行【Distributed Execution】"></a>5. 分布式执行【Distributed Execution】</h2><p>DOP 的具体实现<br>核心点:算子并行化后,是否能够存储层裁剪?如果不能,可能还不如串行执行效率高</p>
<p>Bounded-Memory Execution: 通过混合策略(BFS、DFS)以及调度策略控制内存使用,防止OOM<br>Early-Stop Optimization:并行化执行 也就是和串行中一样尽量做到「恰到好处的Stop」 因为底层存储进行RPC通信是代价最大的<br>这个策略的性能表现在第六章中有测试结果</p>
<p>例子:<br>Q5: g.V(2).repeat(out().simplePath())<br>.times(4).path()<br>.limit(k)</p>
<h2 id="6-测试与评估-《Evaluation》"><a href="#6-测试与评估-《Evaluation》" class="headerlink" title="6.测试与评估 《Evaluation》"></a>6.测试与评估 《Evaluation》</h2><p>6.1 测试用例<br>数据集<br><img src="/DataSet-6.png" alt="DataSet" title="Magic Gardens"><br>测试Query 14个 从哪里能拿到?<br>配置<br>6.2 可扩展性:测试结果<br><img src="/Scale-6.png" alt="Scale"></p>
<p>6.3 核心技术设计:测试结果 </p>
<ol>
<li>内存大小与Latency 之间的关系</li>
<li>Traversal Strategy:BFS、DFS 内存/latency 之间的关系 结论:DFS 效果很好</li>
<li>EarlyStop 性能表现:enables 12× improved performance and 1GB memory saving when the limit number is 10. 当limit(10) 有12倍的性能提升</li>
<li>Compare with big engines.:采用提出的新技术,性能以及内存方面有卓越的表现:内存限制,混合调度,Early-Stop(Limit)</li>
</ol>
<p><img src="/KeyDesign-6.png" alt="KeyDesign"></p>
<p>6.4 和其他图数据库比较查询性能</p>
<ol>
<li>小数据规模G1 数据集中,整个表现比不上单独的数据库 GAIA has an average relative per-formance of just around 1.8 using single thread, and of 0.73 using 16 threads, among all LDBC queries.</li>
<li>大数据规模G100 数据集中,与JnasGraph 对比,性能整体上比数据优秀,不过也是10^3 ms 级别,JN很多查询OT<br><img src="/CompareDB.png" alt="CompareDB"></li>
</ol>
<h2 id="7-相关工作"><a href="#7-相关工作" class="headerlink" title="7. 相关工作"></a>7. 相关工作</h2><p> 核心点不是替换 GraphDB 场景,主要是面向用户的Graph Processing Systems ,降低用户使用门槛儿,并且降低Latency<br> 目前支持了Gremlin + Cypher 以及基于Langchain 可以提供Gremlin编写服务【21年到现在发展的样子】 </p>
<h2 id="8-结论"><a href="#8-结论" class="headerlink" title="8. 结论"></a>8. 结论</h2><p> 这个系统在内部提供了大图分析的能力</p>
</div>
<footer class="article-footer">
<a data-url="http://example.com/2023/11/03/GremlinParallelization/" data-id="cloij3nvm0001ec614gnqc85i" data-title="GremlinParallelization" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-ForDataInfra" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/2023/11/02/ForDataInfra/" class="article-date">
<time class="dt-published" datetime="2023-11-02T11:42:57.000Z" itemprop="datePublished">2023-11-02</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/2023/11/02/ForDataInfra/">ForDataInfra</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<h1 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h1><p>这个博客是17年11月30日第一次提交,到今天差不多已经6年过去了,这六年在博客中没有留下任何足迹,只是在对应的仓库中逐渐成长,<br>从今天起,用这个博客持续记录工作技术中的点点滴滴,争取在 Data/Infra 方向持续创作输出。</p>
<h1 id="涉及组件"><a href="#涉及组件" class="headerlink" title="涉及组件"></a>涉及组件</h1><ol>
<li>HBase </li>
<li>Druid</li>
<li>HugeGraph<br>核心技术:tinkerPop:GremlinServer,RestFulServer</li>
<li>GraphScope<br> 语言栈:<br> Java<br> CPP<br> Rust<br> Python</li>
</ol>
</div>
<footer class="article-footer">
<a data-url="http://example.com/2023/11/02/ForDataInfra/" data-id="cloij3nvh0000ec61dlv9eyn6" data-title="ForDataInfra" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
<article id="post-hello-world" class="h-entry article article-type-post" itemprop="blogPost" itemscope itemtype="https://schema.org/BlogPosting">
<div class="article-meta">
<a href="/2023/11/02/hello-world/" class="article-date">
<time class="dt-published" datetime="2023-11-02T11:33:14.578Z" itemprop="datePublished">2023-11-02</time>
</a>
</div>
<div class="article-inner">
<header class="article-header">
<h1 itemprop="name">
<a class="p-name article-title" href="/2023/11/02/hello-world/">Hello World</a>
</h1>
</header>
<div class="e-content article-entry" itemprop="articleBody">
<p>Welcome to <a target="_blank" rel="noopener" href="https://hexo.io/">Hexo</a>! This is your very first post. Check <a target="_blank" rel="noopener" href="https://hexo.io/docs/">documentation</a> for more info. If you get any problems when using Hexo, you can find the answer in <a target="_blank" rel="noopener" href="https://hexo.io/docs/troubleshooting.html">troubleshooting</a> or you can ask me on <a target="_blank" rel="noopener" href="https://github.com/hexojs/hexo/issues">GitHub</a>.</p>
<h2 id="Quick-Start"><a href="#Quick-Start" class="headerlink" title="Quick Start"></a>Quick Start</h2><h3 id="Create-a-new-post"><a href="#Create-a-new-post" class="headerlink" title="Create a new post"></a>Create a new post</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo new <span class="string">"My New Post"</span></span><br></pre></td></tr></table></figure>
<p>More info: <a target="_blank" rel="noopener" href="https://hexo.io/docs/writing.html">Writing</a></p>
<h3 id="Run-server"><a href="#Run-server" class="headerlink" title="Run server"></a>Run server</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo server</span><br></pre></td></tr></table></figure>
<p>More info: <a target="_blank" rel="noopener" href="https://hexo.io/docs/server.html">Server</a></p>
<h3 id="Generate-static-files"><a href="#Generate-static-files" class="headerlink" title="Generate static files"></a>Generate static files</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo generate</span><br></pre></td></tr></table></figure>
<p>More info: <a target="_blank" rel="noopener" href="https://hexo.io/docs/generating.html">Generating</a></p>
<h3 id="Deploy-to-remote-sites"><a href="#Deploy-to-remote-sites" class="headerlink" title="Deploy to remote sites"></a>Deploy to remote sites</h3><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">$ hexo deploy</span><br></pre></td></tr></table></figure>
<p>More info: <a target="_blank" rel="noopener" href="https://hexo.io/docs/one-command-deployment.html">Deployment</a></p>
</div>
<footer class="article-footer">
<a data-url="http://example.com/2023/11/02/hello-world/" data-id="cloij3nvo0002ec619tjdaji8" data-title="Hello World" class="article-share-link"><span class="fa fa-share">Share</span></a>
</footer>
</div>
</article>
</section>
<aside id="sidebar">
<div class="widget-wrap">
<h3 class="widget-title">Archives</h3>
<div class="widget">
<ul class="archive-list"><li class="archive-list-item"><a class="archive-list-link" href="/archives/2023/11/">November 2023</a></li></ul>
</div>
</div>
<div class="widget-wrap">
<h3 class="widget-title">Recent Posts</h3>
<div class="widget">
<ul>
<li>
<a href="/2023/11/03/GremlinParallelization/">GremlinParallelization</a>
</li>
<li>
<a href="/2023/11/02/ForDataInfra/">ForDataInfra</a>
</li>
<li>
<a href="/2023/11/02/hello-world/">Hello World</a>
</li>
</ul>
</div>
</div>
</aside>
</div>
<footer id="footer">
<div class="outer">
<div id="footer-info" class="inner">
© 2023 John Doe<br>
Powered by <a href="https://hexo.io/" target="_blank">Hexo</a>
</div>
</div>
</footer>
</div>
<nav id="mobile-nav">
<a href="/" class="mobile-nav-link">Home</a>
<a href="/archives" class="mobile-nav-link">Archives</a>
</nav>
<script src="/js/jquery-3.6.4.min.js"></script>
<script src="/fancybox/jquery.fancybox.min.js"></script>
<script src="/js/script.js"></script>
</div>
</body>
</html>