Atom Feed

myersguo's blog 2018-02-07T03:15:35+00:00 myersguo python http 请求编码问题追查 2018-02-06T00:00:00+00:00 /2018/02/06/python-unicode-error <p>背景: <br /> 有一个上传文件的业务,偶尔会报错: <br /> <code class="highlighter-rouge">'utf8' codec can't decode byte 0xce in position 17: invalid continuation byte</code></p> <p>问题追查,我们看下堆栈上下文:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>import traceback traceback.print_exc() </code></pre></div></div> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> File "/usr/lib/python2.7/httplib.py", line 1039, in request self._send_request(method, url, body, headers) File "/usr/lib/python2.7/httplib.py", line 1073, in _send_request self.endheaders(body) File "/usr/lib/python2.7/httplib.py", line 1035, in endheaders self._send_output(message_body) File "/usr/lib/python2.7/httplib.py", line 877, in _send_output msg += message_body UnicodeDecodeError: 'utf8' codec can't decode byte 0xce in position 17: invalid continuation byte </code></pre></div></div> <p>ok,我们看到这里是因为字符串相加造成的。看到这里就是经典的 python unicode 和 string 的问题了。我们先补充下基础知识,先看一个例子:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>["{:02x}".format(ord(c)) for c in '我'] ['e6', '88', '91'] ["{:02x}".format(ord(c)) for c in u'我'] ['6211'] [ord(c) for c in u'我'] [25105] </code></pre></div></div> <p>因为计算机无法表示’我’,只能用编码方式表示。所以系统需要一个默认的编码方式,可通过:<code class="highlighter-rouge">sys.setdefaultencoding('UTF8')</code>来设置。ok, 那:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>'我' + u'我' </code></pre></div></div> <p>就相当于:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>'我'.decode('ascii') + u'我' </code></pre></div></div> <p>当然会报错。</p> <p>回到我们最初的问题,我们来看下<code class="highlighter-rouge">httplib</code>的代码:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> def request(self, method, url, body=None, headers={}): """Send a complete request to the server.""" self._send_request(method, url, body, headers) def _send_request(self, method, url, body, headers): # Honor explicitly requested Host: and Accept-Encoding: headers. header_names = dict.fromkeys([k.lower() for k in headers]) skips = {} if 'host' in header_names: skips['skip_host'] = 1 if 'accept-encoding' in header_names: skips['skip_accept_encoding'] = 1 self.putrequest(method, url, **skips) if body is not None and 'content-length' not in header_names: self._set_content_length(body) for hdr, value in headers.iteritems(): self.putheader(hdr, value) self.endheaders(body) def _send_output(self, message_body=None): """Send the currently buffered request and clear the buffer. Appends an extra \\r\\n to the buffer. A message_body may be specified, to be appended to the request. """ self._buffer.extend(("", "")) msg = "\r\n".join(self._buffer) del self._buffer[:] # If msg and message_body are sent in a single send() call, # it will avoid performance problems caused by the interaction # between delayed ack and the Nagle algorithm. if isinstance(message_body, str): msg += message_body message_body = None self.send(msg) if message_body is not None: #message_body was not a string (i.e. it is a file) and #we must run the risk of Nagle self.send(message_body) </code></pre></div></div> <p>可以看到: <code class="highlighter-rouge">msg += message_body</code>,其中<code class="highlighter-rouge">msg</code>是http的请求头的相关信息,而message_body 是请求体.</p> <p>而这里必须是 ascii 编码。</p> <p>我们写个 demo 复现一下问题:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code># coding: utf8 import sys reload(sys) sys.setdefaultencoding('UTF8') from httplib import HTTPConnection conn = HTTPConnection('www.baidu.com') conn.request('GET', '/s=xxxx', '我\xce\xe6\x88\x91') print conn.getresponse().read() conn = HTTPConnection('www.baidu.com') conn.request('GET', u'/s=xxxx', '我\xce\xe6\x88\x91') print conn.getresponse().read() </code></pre></div></div> <p>ok,上面的输出没有问题,下面的输出,就会直接报错。因为我们这里的url 会拼接到请求头中,没有使用 ascii。</p> <p>(done)</p> Hello,React 2018-01-29T00:00:00+00:00 /2018/01/29/hello-react <h3 id="认知">认知</h3> <p>从<code class="highlighter-rouge">jquery</code>到<code class="highlighter-rouge">angularjs</code>,再切换到<code class="highlighter-rouge">react</code>你能感到一种「工程方式」的转变。在 react 全家桶下,写的代码更像是照着「模具」画像(工程感更强烈)。React 的一些新认知:</p> <p><code class="highlighter-rouge">组件和DOM树</code>: 浏览器把CSS和HTML解析成DOM树,而React同样把HTML结构解析成了DOM数,不过更加轻量级。DOM树的每一个节点变化由React对象管理。 <br /> <code class="highlighter-rouge">JSX</code>: React的开发语言,混合HTML和JS,<code class="highlighter-rouge">&lt;</code>开头用HTML解析,<code class="highlighter-rouge">{</code>用JS解析。</p> <h3 id="渲染与事件">渲染与事件</h3> <p><a href="https://reactjs.org/docs/components-and-props.html">props</a>: DOM的属性; <br /> state: model存储,保存当前的状态;</p> <p>组件包括 mount(渲染到dom tree)-&gt;update-&gt;unmount</p> <p>componentWillMount: render 之前。预处理操作,一般没啥用。</p> <blockquote> <p>The componentDidMount() hook runs after the component output has been rendered to the DOM.</p> </blockquote> <p>render: 渲染函数。</p> <p>componentDidMount: render之后,</p> <blockquote> <p>componentDidMount() is invoked immediately after a component is mounted. Initialization that requires DOM nodes should go here. If you need to load data from a remote endpoint, this is a good place to instantiate the network request.</p> </blockquote> <p>componentWillReceiveProps:</p> <blockquote> <p>componentWillReceiveProps() is invoked before a mounted component receives new props. If you need to update the state in response to prop changes (for example, to reset it), you may compare this.props and nextProps and perform state transitions using this.setState() in this method.</p> </blockquote> <p>setState:</p> <blockquote> <p>setState() enqueues changes to the component state and tells React that this component and its children need to be re-rendered with the updated state. This is the primary method you use to update the user interface in response to event handlers and server responses.</p> </blockquote> <h3 id="redux">redux</h3> <p>store: 对应props,用来共享状态。 <br /> state: store中对应的数据。 <br /> action: view的操作对应的通知类型,用于改变state, store.dispatch 来发出action。 <br /> reducer:处理action产生新的state.</p> <h3 id="参考资料">参考资料</h3> <p><a href="https://developers.google.com/web/fundamentals/performance/critical-rendering-path/constructing-the-object-model?hl=zh-cn">1 浏览器渲染</a></p> 2017 2018-01-16T00:00:00+00:00 /2018/01/16/2017-2018 <p>又是一年,三十了。又是一年,有太多遗憾,有些许成长,更多的是亏欠。</p> <p>回忆下2016年立的flag:<br /> 周末的跑步。没有坚持下去(年初换工作,新工作后只坚持了2个月);<br /> 提高开车技术。买车了,开车技术有所进步;<br /> 远行一次。又一次没有完成了,2018年不能再拖了。 <br /> 学会炒股。2017年2月份开始进入股市,目前盈亏平衡。 <br /> 和朋友认真相处。旧朋友还来得及看清,新朋友没顾得上出现。 <br /> 和家人相亲相爱。对沫沫还是耐心不够,惹/气哭了好多次;对老婆关爱不够,老婆对我一脸嫌弃;对家人脾气很大。。。 <br /> 把多看的书读完。只读了一点点。。。。</p> <p>工作:年中换了工作,新工作方式发生很大变化,开始有些不适应,现在逐渐适应,做好自己,对得起这份工资,保持好奇心、求知欲,也就无憾了吧?</p> <p>年纪有大了一点。沫沫又长了一岁。现在看她健健康康的,我就很放心了。 <br /> 记得半年多前,我在清河的出租屋里想着人生是多么无趣,工作,睡觉,一张桌椅,而不是工作,回家和孩子一起玩耍,然后睡觉。半年过去,我几乎不再想这些事情,我是麻木了吗?</p> <p>2018年,我的新flag是: 跑步。跑一次10公里+;体重减20斤; 旅游。周末周边游若干次,一次远途的家庭旅行;<br /> 控制。减少手机使用量(走路、地铁上不看手机);说话不要太「刺耳」;放慢去看世界; <br /> 读书。多看 &amp; 买的书都看完。 <br /> 家庭。对孩子更加耐心,和家人、孩子认真的相处;<br /> 其他。坚持写作。炒股盈利。开车技术大幅提升。</p> golang json 处理 2017-12-19T00:00:00+00:00 /2017/12/19/golang-json-sort <h3 id="用法">用法</h3> <p>【TODO】</p> <h3 id="疑问">疑问</h3> <p>golang json 解析后竟然是按照<code class="highlighter-rouge">key</code>字母升序的排列:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">package</span> <span class="n">main</span> <span class="n">import</span> <span class="p">(</span> <span class="s2">"encoding/json"</span> <span class="s2">"fmt"</span> <span class="p">)</span> <span class="n">type</span> <span class="n">User</span> <span class="n">struct</span> <span class="p">{</span> <span class="n">Name</span> <span class="k">string</span> <span class="p">`</span><span class="n">json</span><span class="p">:</span><span class="s2">"name"</span><span class="p">`</span> <span class="n">Id</span> <span class="n">int</span> <span class="p">`</span><span class="n">json</span><span class="p">:</span><span class="s2">"id"</span><span class="p">`</span> <span class="p">}</span> <span class="n">func</span> <span class="n">getJson</span><span class="p">(</span><span class="n">v</span> <span class="n">interface</span><span class="p">{})</span> <span class="p">(</span><span class="k">string</span><span class="p">,</span> <span class="n">error</span><span class="p">)</span> <span class="p">{</span> <span class="n">s</span><span class="p">,</span> <span class="n">err</span> <span class="p">:=</span> <span class="n">json</span><span class="p">.</span><span class="n">Marshal</span><span class="p">(</span><span class="n">v</span><span class="p">)</span> <span class="n">fmt</span><span class="p">.</span><span class="n">Println</span><span class="p">(</span><span class="s2">"before:"</span><span class="p">,</span> <span class="k">string</span><span class="p">(</span><span class="n">s</span><span class="p">))</span> <span class="k">if</span> <span class="n">err</span> <span class="c1">!= nil { </span> <span class="n">return</span> <span class="s2">""</span><span class="p">,</span> <span class="n">err</span> <span class="p">}</span> <span class="n">m</span> <span class="p">:=</span> <span class="n">make</span><span class="p">(</span><span class="n">map</span><span class="p">[</span><span class="k">string</span><span class="p">]</span><span class="n">interface</span><span class="p">{})</span> <span class="n">json</span><span class="p">.</span><span class="n">Unmarshal</span><span class="p">(</span><span class="n">s</span><span class="p">,</span> <span class="p">&amp;</span><span class="n">m</span><span class="p">)</span> <span class="n">r</span><span class="p">,</span> <span class="n">_</span> <span class="p">:=</span> <span class="n">json</span><span class="p">.</span><span class="n">Marshal</span><span class="p">(</span><span class="n">m</span><span class="p">)</span> <span class="n">return</span> <span class="k">string</span><span class="p">(</span><span class="n">r</span><span class="p">),</span> <span class="n">nil</span> <span class="p">}</span> <span class="n">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span> <span class="n">var</span> <span class="n">t1</span> <span class="p">=</span> <span class="n">User</span><span class="p">{</span> <span class="n">Name</span><span class="p">:</span> <span class="s2">"hello"</span><span class="p">,</span> <span class="n">Id</span><span class="p">:</span> <span class="m">1</span><span class="p">,</span> <span class="p">}</span> <span class="n">v</span><span class="p">,</span> <span class="n">_</span> <span class="p">:=</span> <span class="n">getJson</span><span class="p">(&amp;</span><span class="n">t1</span><span class="p">)</span> <span class="n">fmt</span><span class="p">.</span><span class="n">Println</span><span class="p">(</span><span class="s2">"after:"</span><span class="p">,</span> <span class="n">v</span><span class="p">)</span> <span class="p">}</span> </code></pre></div></div> <p>输出: <br /> <code class="highlighter-rouge">before: {"name":"hello","id":1} after: {"id":1,"name":"hello"}</code></p> <p>这样如果有其他语言要和<code class="highlighter-rouge">golang</code>保持一致(比如做sign),那就要做相应的处理,如 python:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>json.dumps(v, sort_keys=True, separators=(',', ':')) </code></pre></div></div> <p>另外,golang 默认使用的 ascii 来编码,python 需要如下处理:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>json.dumps(v, sort_keys=True, separators=(',', ':'), ensure_ascii=False) </code></pre></div></div> nginx 服务降级 2017-12-12T00:00:00+00:00 /2017/12/12/nginx-cache <p>假如一个<code class="highlighter-rouge">http</code>服务要做降级,可以从<code class="highlighter-rouge">nginx</code>入手:</p> <p>1) nginx 添加cache</p> <p>nginx 可以对整个接口内容做cache,假如一个服务在一定时间内返回是固定的,可以直接在nginx层做cache。cache key 一般是<code class="highlighter-rouge">host+uri+arg</code>,如果 args 中有时间戳之类的变量,需要做一些特殊处理(去掉时间戳参数),举个例子:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>proxy_cache_path /tmp/nginx/cache levels=1:2 keys_zone=cache:10m max_size=512m inactive=7d; server { listen 1234; set $new_args $args; if ($new_args ~ (.*)(?:&amp;|^)_timestmap=[^&amp;]*(.*)) { set $new_args $1$2; } location / { proxy_connect_timeout 1s; proxy_send_timeout 6s; proxy_read_timeout 10s; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_cache cache; proxy_cache_key $host$uri$is_args$new_args; proxy_ignore_headers X-Accel-Expires Expires Cache-Control Set-Cookie;#坑 proxy_cache_valid 200 10m; #缓存10分钟 proxy_pass http://myapi; } } upstream myapi { server 127.0.0.1:9090 max_fails=3 fail_timeout=5s; keepalive 32; } </code></pre></div></div> <p>刚配置的时候遇到缓存不生效的问题,原因为cache会判断 http header 中是否有 cache-control 或 Cookie设置,如果这些是可以忽略的,那设置 <code class="highlighter-rouge">proxy_ignore_headers X-Accel-Expires Expires Cache-Control Set-Cookie;</code> 就会忽略这些变化</p> <p>ok, 我们benchmark 一下:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>with cache: 2808.09 [#/sec] (mean) no cache: 111.65 [#/sec] (mean) </code></pre></div></div> <p>20 倍的性能提升!</p> <p>坑:加入每个用户请求的参数完全不同,那这个cache就失去了意义。我们需要对必要的参数进行 cache key 才行。比如我要对uid后两位做cache。</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> set_by_lua $uid_check " local uid = tonumber(ngx.var.arg_uid) or 0 return tostring(uid % 100) "; set $cache_key $arg_uid_check$arg_other; </code></pre></div></div> <p>2) 降级</p> <p>如果我们扛不住某个时间的流量,我们根据一定条件拒掉50%的流量.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>set $old_uri $uri; location / { rewrite_by_lua ' if ngx.now() &gt; 1514736000 and ngx.now() &lt; 1514822400 and ngx.var.arg_debug ~= "1" and ngx.var.arg_uid == "1" then x = math.random(100) if x &lt; 50 then ngx.req.set_uri("/pass", true); else ngx.req.set_uri("/deny", true); end else ngx.req.set_uri("/pass", true); end'; } location = /deny { default_type application/json; return 200 '{"message": "success", "data":{}}'; } location = /pass { proxy_next_upstream error http_502; proxy_connect_timeout 1s; proxy_send_timeout 6s; proxy_read_timeout 10s; proxy_set_header Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_cache cache; proxy_cache_key $host$old_uri$is_args$new_args; proxy_ignore_headers X-Accel-Expires Expires Cache-Control Set-Cookie; proxy_cache_valid 200 10m; rewrite ^/pass $old_uri break; proxy_pass http://myapi; } </code></pre></div></div> Python Multiple Thread 2017-11-26T00:00:00+00:00 /2017/11/26/python-multiple-thread <h3 id="线程的定义">线程的定义</h3> <ul> <li>start_new_thread</li> </ul> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>thread.start_new_thread(f) </code></pre></div></div> <ul> <li>使用线程类:</li> </ul> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>t = threading.Thread(target=f) t.start() t.join() </code></pre></div></div> <p>t.join:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>self.__block.acquire() # 获取条件锁 while not self.__stopped: # 如果线程未完成,继续等待 self.__block.wait() # 释放锁,等待结束 </code></pre></div></div> <p>线程的执行内部是走的<code class="highlighter-rouge">run</code>方法,run 内部调用的 target 方法。</p> <ul> <li>继承线程类:</li> </ul> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>class MyThread(threading.Thread): def __init__(self): #threading.Thread.__init__(self) super(MyThread, self).__init__() def run(self): print 'thead running here' </code></pre></div></div> <h3 id="线程的同步">线程的同步</h3> <p>基础:锁 <br /> 锁的集中方式:<code class="highlighter-rouge">Condition</code>, <code class="highlighter-rouge">Semaphore(一种Condition)</code>,<code class="highlighter-rouge">Event(一种Condition)</code>,<code class="highlighter-rouge">Queue</code></p> <p><code class="highlighter-rouge">Condition</code> 使用可重入锁RLock 来实现.</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def producer(cv): # cv.acquire() with cv: print 'produce' cv.notifyAll() # cv.notify(), cv.release() def consummer(cv): with cv: # cv.acquire() cv.wait() # cv.wait(), cv.release() print 'consume' </code></pre></div></div> <p><code class="highlighter-rouge">Event</code>:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>event.set 设置事件 event.clear() 发送事件 event.wait() 等待事件触发 </code></pre></div></div> About FastText 2017-11-23T00:00:00+00:00 /2017/11/23/about-fasttext <h3 id="基础知识">基础知识</h3> <h4 id="precision_and_recall">Precision_and_recall</h4> <p>这个在<a href="https://en.wikipedia.org/wiki/Precision_and_recall">维基百科</a>上解释的非常好,我这里翻译一下:</p> <blockquote> <p>假如图片集合中有12 张狗,其它的是其他动物。一个自动识别的程序,识别之后的结果是: 8张狗,其他是其他动物。<br /> 假如识别的8张狗中有5张是对的,那么,识别的 正确率是 5/8。 识别的 recall 值是 5/12 (12张狗,只识别了5张)。</p> </blockquote> <p>precision 范围: 0-1, 越大越好; <br /> recall 范围:0-1, 越大越好。</p> <h4 id="文本单词的数学表示">文本/单词的数学表示</h4> <p>现在机器学习流行的方式是使用向量矩阵(vector)来表示</p> <p>举个例子:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>wget https://s3-us-west-1.amazonaws.com/fasttext-vectors/cooking.stackexchange.tar.gz &amp;&amp; tar xvzf cooking.stackexchange.tar.gz head -n 12404 cooking.stackexchange.txt &gt; cooking.train tail -n 3000 cooking.stackexchange.txt &gt; cooking.valid </code></pre></div></div> <p>这个下载好的数据,就是「训练集」,它是已经做好的分类数据。内容为:</p> <p>类别, 文本内容</p> <p><strong>文本分类</strong> 的 <strong>步骤</strong> 包括:</p> <p>训练集–&gt; 预处理 –&gt; 特征提取(一般为向量集表示) —&gt; 建模 –&gt; 使用模型进行分类</p> <h3 id="fasttext">fasttext</h3> <p>下面是官方的例子演练:</p> <p>训练生成模型分类器:</p> <blockquote> <p>./fasttext supervised -input cooking.train -output model_cooking -epoch 25</p> </blockquote> <p>交互式测试分类器:</p> <blockquote> <p>./fasttext predict model_cooking.bin -</p> </blockquote> <p>用测试样本进行测试:</p> <blockquote> <p>./fasttext test model_cooking.bin cooking.valid</p> </blockquote> <p>测试结果是 precision 和 recall 的表示。</p> <p><strong>文本训练</strong>的 <strong>基础</strong> 是样本。没有训练样本就是白扯。以下是一些公开的文本分类数据:</p> <p>维基百科数据: <a href="https://dumps.wikimedia.org/">https://dumps.wikimedia.org/</a>,例如中文数据为:<a href="https://dumps.wikimedia.org/zhwiki/latest/zhwiki-latest-pages-articles.xml.bz2">https://dumps.wikimedia.org/zhwiki/latest/zhwiki-latest-pages-articles.xml.bz2</a> <br /> dbpedia/yelp_review_full/amazon_review_full/sogou_news: <a href="https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M?spm=5176.100239.blogcont128589.13.L2tfdg">https://drive.google.com/drive/folders/0Bz8a_Dbh9Qhbfll6bVpmNUtUcFdjYmF2SEpmZUZUcVNiMUw1TWN6RDV3a0JHT3kxLVhVR2M?spm=5176.100239.blogcont128589.13.L2tfdg</a> <br /> 搜狗实验室预料: <a href="http://www.sogou.com/labs/resource/list_yuliao.php">http://www.sogou.com/labs/resource/list_yuliao.php</a> <br /> <a href="http://thuctc.thunlp.org/">THUCTC</a> <br /> …</p> <h3 id="原理篇">原理篇</h3> <h3 id="参考资料">参考资料</h3> <p><a href="http://www.jianshu.com/p/b7ede4e842f1">fasttext vs word2vec</a><br /> <a href="https://fasttext.cc/docs/en/supervised-tutorial.html#content">fasttext</a><br /> <a href="https://en.wikipedia.org/wiki/Bayes%27_theorem">Bayes_theorem</a><br /> <a href="https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values">Positive_and_negative_predictive_values</a> <br /> <a href="https://en.wikipedia.org/wiki/Binary_classification">Binary_classification</a> <br /> <a href="https://en.wikipedia.org/wiki/Sensitivity_and_specificity">Sensitivity_and_specificity</a> <br /> <a href="https://en.wikipedia.org/wiki/Precision_and_recall">Precision_and_recall</a> <br /> <a href="http://www.52nlp.cn/%E4%B8%AD%E8%8B%B1%E6%96%87%E7%BB%B4%E5%9F%BA%E7%99%BE%E7%A7%91%E8%AF%AD%E6%96%99%E4%B8%8A%E7%9A%84word2vec%E5%AE%9E%E9%AA%8C">中英文维基百科语料上的Word2Vec实验</a></p> thinking in feed timeline 2017-11-22T00:00:00+00:00 /2017/11/22/how-to-realize-feed <h3 id="timeline">timeline</h3> <p>两种类型:user timeline(用户的的个人主页), home timeline(用户关注的人的信息流merge)</p> <p>实现方式:</p> <h3 id="推push">推(push)</h3> <p>用户发信息时,推送给所有的粉丝(更新粉丝的 home timeline)</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> def get_followers(uid): return redis.smembers('followers:' + uid) def write_post(uid, data): id = redis.incr('post:uuid:') # 生成 Twitter id redis.hset('posts:%s' % id, data) # 存放 内容到 hash set followers = get_followers(uid) # 获取粉丝列表 for follower in followers: redis.zadd('home_timeline:%s' % follower , id) # 写入粉丝列表 timeline </code></pre></div></div> <p>获取 timeline:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> def get_timeline(uid): posts = redis.zrevrange('home_timeline:%s' % uid, 0, 30) #获取30个 feed timeline ret = [] for id in posts: ret[id] = hgetall('posts:%s' % id) # 获取每个id的内容 return ret </code></pre></div></div> <p>这里的推送是将 id 写入到 集合中,当然也可以使用队列的方式,写时 lpush 到粉丝的 timeline 中,粉丝获取 timeline时,lrange 获取。</p> <h3 id="拉">拉</h3> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code> #写 def write_post(uid, data): id = redis.incr('post:uuid:') # 生成 Twitter id redis.hset('posts:%s' % id, data) # 存放 内容到 hash set redis.sadd('posts:user:' % uid, id) def get_following(uid): return redis.smembers('following:' + uid) #读 def get_timeline(uid): followings = get_following(uid) # 获取关注的列表 redis.zunionstore(ret, {'post::user:user1', 'post:user:user2'}) # 获取关注的用户的并集 </code></pre></div></div> <h3 id="问题">问题</h3> <p>推的问题: <br /> 假如一个用户的粉丝有1000万个,那么一次写入要推送到1千万用户的timeline中,非常耗时,解决办法有: <br /> * 粉丝按照活跃度排序,优先推给活跃的粉丝,其他粉丝异步延迟推送<br /> * 当粉丝数过多时,不进行推送,粉丝的timeline 从关注的大V中获取最新 post ,然后进行 merge <br /> 拉的问题: <br /> 每次拉取都是一次大量并集运算, 相反「推」则一次 get 可获取所有消息列表。 * 关注者按照活跃度排序,只获取特定数量的活跃用户记录</p> <h3 id="参考资料">参考资料</h3> <p><a href="http://highscalability.com/blog/2013/7/8/the-architecture-twitter-uses-to-deal-with-150m-active-users.html">The Architecture Twitter Uses To Deal With 150M Active Users, 300K QPS, A 22 MB/S Firehose, And Send Tweets In Under 5 Seconds</a> <a href="https://blog.twitter.com/engineering/en_us/topics/infrastructure/2017/the-infrastructure-behind-twitter-scale.html">The Infrastructure Behind Twitter: Scale</a> <a href="https://segmentfault.com/a/1190000004650279">Redis timeline</a></p> django celery transaction error 2017-11-12T00:00:00+00:00 /2017/11/12/django-celery-transaction <p>背景: 提交任务,异步处理。</p> <p>异步处理方案: <code class="highlighter-rouge">celery</code></p> <p>问题: <br /> <code class="highlighter-rouge">celery</code> 在 run 的时候,根据 task id 来 <code class="highlighter-rouge">do something</code>, 但是偶尔会报错:</p> <p><code class="highlighter-rouge"> matching query does not exist </code></p> <p>查看数据库,数据确实是存在的。为什么会报这个错呢?除非 celery 处理早于 db commit.查看资料(见参考资料):</p> <blockquote> <p>The data will only be externally accessible when the view finishes its execution, and the transaction is committed. This usually will happen <strong>after</strong> Celery executes the task.</p> </blockquote> <p>我们默认<code class="highlighter-rouge">autocommit</code>, 但 commit 是在 celery delay 之后,因此偶尔就会报错。<strong>怎么解决?</strong></p> <p>如果是 django 1.9, 使用 <code class="highlighter-rouge">transaction.on_commit</code>: <code class="highlighter-rouge">transaction.on_commit(lambda: do_stuff.delay(my_data.pk))</code></p> <p>如果是 &lt;1.9, 使用<a href="https://github.com/carljm/django-transaction-hooks">django-transaction-hooks</a></p> <p>另一种解决方案(verifying):<br /> 手动 commit: <code class="highlighter-rouge">transaction.commit()</code></p> <h3 id="参考资料">参考资料</h3> <p><a href="https://www.hypertrack.com/blog/2016/10/08/dealing-with-database-transactions-in-django-celery/">Dealing with database transactions in Django + Celery</a> <br /> <a href="https://www.vinta.com.br/blog/2016/database-concurrency-in-django-the-right-way/">Database concurrency in Django the right way</a> <br /> <a href="https://django-transaction-hooks.readthedocs.io/en/latest/">django-transaction-hooks</a></p> ironic python agent source code 一瞥 2017-11-09T00:00:00+00:00 /2017/11/09/ironic-agent-study <p>入口 cmd/run:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from ironic_python_agent import agent def run(): agent.IronicPythonAgent(api_url, agent.Host,....) .run() </code></pre></div></div> <p>agent:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>extensions/base.py class BaseAgentExtension(object): def __init__(self, agent=None): super(BaseAgentExtension, self).__init__() self.agent = agent self.command_map = dict( (v.command_name, v) for k, v in inspect.getmembvers(self) if hasattr(v, 'command_name') ) def execute(self, command_name, **kwargs): cmd = self.command_map.get(command_name) if cmd is None: raise return cmd(**kwargs) def ExecuteCommandMixin(object): def __init__(self): self.command_lock = threading.Lock() self.command_results = collections.OrderedDict() self.ext_mgr = None #命令的扩展都是 &lt;extension&gt;.&lt;name&gt; def split_command(self, command_name): command_parts = command_name.split('.', 1) return (command_parts[0], command_parts[1]) def get_extension(self, extension_name): ext = self.ext_mgr[extension_name].obj ext.ext_mgr = self.ext_mgr return ext def execute_command(self, command_name, **kwargs): with self.command_lock: extension_part, command_part = self.split_command(command_name) try: ext = self.get_extension(extension_part) result = ext.execute(command_part, **kwargs) except KeyError: ... self.command_results[result.id] = result class IronicPytonAgent(base.ExecuteCommandMixin): def __init__(self, api_url, advertise_address,...): super(IronicPythonAgent, self).__init__() self.ext_mgr = extensions.ExtensionManager( namespace='ironnic_python_agent.extensions', invoke_on_load=True, invoke_kwds={'agent':self} ) if self.api_url: self.api_client = ironic_api_client.APIClient(self.api_url) #心跳 self.heartbeater = IronicPythonAgentHeartbeater(self) def run(self): """启动agent"" self.started_at = _time() ... wsgi = simple_server.make_server( self.listen_address.hostname, self.listen_address.port, self.api, server_class=simple_server.WSGIServer) if not self.standalone and self.api_url: #与服务端的心跳 self.heartbeater.start() try: wsgi.serve_forever() except: ... </code></pre></div></div> <p>application/controller:</p> <div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>api/controllers/v1/command.py class CommandController(rest.RestController): @wsme_pecan.wsexpose(CommandResult, types.text, body=Comand) def post(self, wait=None, command=None): ""Post a command for the agent to run if command is None: command = Command() agent = pecan.request.agent result = agent.execute_command(command.name, **command.params) if wait and wait.lower() == 'true': result.join() return result </code></pre></div></div> <p>命令使用插件(extension) 的方式执行,agent 启动一个 api server 用于接收指令,同时开启心跳。每当rest请求过来时执行命令。</p>