mongodb的mapreduce用法及php示例代码

php中文网
发布: 2016-08-08 09:30:30
原创
1418人浏览过
MongoDB虽然不像我们常用的mysql,sqlserver,oracle等关系型数据库有group by函数那样方便分组,但是MongoDB要实现分组也有3个办法: * Mongodb三种分组方式: * 1、group(先筛选再分组,不支持分片,对数据量有所限制,效率不高) * 2、mapreduce(基于js引擎,单线程执行,效率较低,适合用做后台统计等) * 3、aggregate(推荐) (如果你的PHP的mongodb驱动版本需>=1.3.0,推荐你使用aggregate,性能要高很多,并且使用上要简单些,不过1.3的目前还不支持账户认证模式,可以通过 http://pecl.php.net/package/mongo 查看更新日志和Bug)下面就来看下mapreduce方式:Mongodb官网对MapReduce介绍:Map/reduce in MongoDB is useful for batch processing of data and aggregation operations. It is similar in spirit to using something like Hadoop with all input coming from a collection and output going to a collection. Often, in a situation where you would have used GROUP BY in SQL, map/reduce is the right tool in MongoDB.大致意思是: Mongodb中的Map/reduce主要是用来对数据进行批量处理和聚合操作,有点类似于使用Hadoop对集合数据进行处理,所有输入数据都是从集合中获取,而MapReduce后输出的数据也都会写入到集合中。通常类似于我们在SQL中使用Group By语句一样。使用MapReduce要实现两个函数:Map和Reduce。Map函数调用emit(key,value)遍历集合中所有的记录,将key与value传给Reduce函数进行处理。Map函数和Reduce函数是使用Javascript编写的,并可以通过db.runCommand或mapreduce命令来执行MapReduce操作。 
MapReduce命令如下:
db.runCommand(
{ mapreduce : <span><span><<span>collection</span>></span>,
   map : <span><<span>mapfunction</span>></span>,
   reduce : <span><<span>reducefunction</span>></span>
   [, query : <span><<span>query</span><span>filter</span><span>object</span>></span>]
   [, sort : <span><<span>sort</span><span>the</span><span>query.</span><span>useful</span><span>for</span><span>optimization</span>></span>]
   [, limit : <span><<span>number</span><span>of</span><span>objects</span><span>to</span><span>return</span><span>from</span><span>collection</span>></span>]
   [, out : <span><<span>output-collection</span><span>name</span>></span>]
   [, keeptemp: <span><<span>true|false</span>></span>]
   [, finalize : <span><<span>finalizefunction</span>></span>]
   [, scope : <span><<span>object</span><span>where</span><span>fields</span><span>go</span><span>into</span><span>javascript</span><span>global</span><span>scope</span> ></span>]
   [, verbose : true]
 }
);</span>
登录后复制

参数说明:

mapreduce:要操作的目标集合

map:映射函数(生成键值对序列,作为Reduce函数的参数) 

reduce:统计函数

query:目标记录过滤

立即学习PHP免费学习笔记(深入)”;

sort:对目标记录排序

limit:限制目标记录数量

代码小浣熊
代码小浣熊

代码小浣熊是基于商汤大语言模型的软件智能研发助手,覆盖软件需求分析、架构设计、代码编写、软件测试等环节

代码小浣熊 51
查看详情 代码小浣熊

out:统计结果存放集合(如果不指定则使用临时集合,在客户端断开后自动删除)

keeptemp:是否保留临时集合

finalize:最终处理函数(对reduce返回结果执行最终整理后存入结果集合)

scope:向map、reduce、finalize导入外部变量

verbose:显示详细的时间统计信息


map函数 
map函数调用当前对象,并处里对象的属性,传值给reduce,map方法使用this来操作当前对象,最少调用一次emit(key,value)方法来向reduce提供参数,其中emit的key为最终数据的id。 
reduce函数 
接收一个值和数组,根据需要对数组进行合并分组等处理,reduce的key就是emit(key,value)的key,value_array是同个key对应的多个value数组。 
Finalize函数 
此函数为可选函数,可在执行完map和reduce后执行,对最后的数据进行统一处理。 
看完基本介绍,我们再来看一个实例:已知集合feed,测试数据如下:
{
   "<span>_id</span>": <span>ObjectId(<span>"50ccb3f91e937e2927000004"</span>)</span>,
   "<span>feed_type</span>": <span><span>1</span></span>,
   "<span>to_user</span>": <span><span>234</span></span>,
   "<span>time_line</span>": <span><span>"2012-12-16 01:26:00"</span>
}</span>{
   "<span>_id</span>": <span>ObjectId(<span>"50ccb3ef1e937e0727000004"</span>)</span>,
   "<span>feed_type</span>": <span><span>8</span></span>,
   "<span>to_user</span>": <span><span>123</span></span>,
   "<span>time_line</span>": <span><span>"2012-12-16 01:26:00"</span>
}</span>{
   "<span>_id</span>": <span>ObjectId(<span>"50ccb3e31e937e0a27000003"</span>)</span>,
   "<span>feed_type</span>": <span><span>1</span></span>,
   "<span>to_user</span>": <span><span>123</span></span>,
   "<span>time_line</span>": <span><span>"2012-12-16 01:26:00"</span>
}</span>{
   "<span>_id</span>": <span>ObjectId(<span>"50ccb3d31e937e0927000001"</span>)</span>,
   "<span>feed_type</span>": <span><span>1</span></span>,
   "<span>to_user</span>": <span><span>123</span></span>,
   "<span>time_line</span>": <span><span>"2012-12-16 01:26:00"</span>
}</span>
登录后复制

我们按动态类型feed_type和用户to_user进行分组统计,实现结果:
feed_type to_user cout
1 234 1
8 123 1
1 123 2







实现代码:
<span>//编写map函数</span><span>$map</span> = <span>'
     function() {
<span></span>  var key = {to_user:this.to_user,feed_type:this.feed_type};
<span></span>  var value = {count:1};
<span></span>  emit(key,value);
    } '</span>; 

<span>//reduce 函数</span><span>$reduce</span> = <span>'
     function(key, values) {
         var ret = {count:0};
<span></span> for(var i in values) {
<span></span>      ret.count &#43;= 1;
<span></span>  }
<span></span>  return ret;
      }'</span>;

<span>//查询条件</span><span>$query</span> = <span>null</span>;  <span>//本实例中没有查询条件,设置为null</span>
登录后复制
<span>$mongo</span> = <span>new</span> Mongo(<span>'mongodb://root:root@127.0.0.1: 28017/'</span>); <span>//链接mongodb,账号和密码为root,root</span><span>$instance</span> = <span>$mongo</span>->selectDB(<span>"testdb"</span>);

<span>//执行此命令后,会创建feed_temp_res的临时集合,并将统计后的数据放在该集合中</span><span>$cmd</span> = <span>$instance</span>->command(<span>array</span>(
<span></span><span></span><span>'mapreduce'</span> => <span>'feed'</span>,
<span></span><span></span><span>'map'</span>       => <span>$map</span>,
<span></span><span></span><span>'reduce'</span>    => <span>$reduce</span>,
<span></span><span></span><span>'query'</span>	=> <span>$query</span>,
<span></span><span></span><span>'out'</span> => <span>'feed_temp_res'</span>
));

<span>//查询临时集合中的统计数据,验证统计结果是否和预期结果一致</span><span>$cursor</span> = <span>$instance</span>->selectCollection(<span>'feed_temp_res'</span>)->find();
<span>$result</span> = <span>array</span>();
<span>try</span> {
<span></span><span>while</span> (<span>$cursor</span>->hasNext())
<span></span>{
<span></span><span></span><span>$result</span>[] = <span>$cursor</span>->getNext();
<span></span>}
}
<span>catch</span> (MongoConnectionException <span>$e</span>)
{
<span></span><span>echo</span><span>$e</span>->getMessage();
}
<span>catch</span> (MongoCursorTimeoutException <span>$e</span>)
{
<span></span><span>echo</span><span>$e</span>->getMessage();
}
<span>catch</span>(<span>Exception</span><span>$e</span>){
<span></span><span>echo</span><span>$e</span>->getMessage();
}

<span>//test</span>
var_dump(<span>$result</span>);
登录后复制

下面是输出的结果,和预期结果一致
{
   "<span>_id</span>": <span>{
     "<span>to_user</span>": <span><span>234</span></span>,
     "<span>feed_type</span>": <span><span>1</span>  }</span></span>,
   "<span>value</span>": <span>{
     "<span>count</span>": <span><span>1</span>  }</span>}</span>{
   "<span>_id</span>": <span>{
     "<span>to_user</span>": <span><span>123</span></span>,
     "<span>feed_type</span>": <span><span>8</span>  }</span></span>,
   "<span>value</span>": <span>{
     "<span>count</span>": <span><span>1</span>  }</span>}</span>{
   "<span>_id</span>": <span>{
     "<span>to_user</span>": <span><span>123</span></span>,
     "<span>feed_type</span>": <span><span>1</span>  }</span></span>,
   "<span>value</span>": <span>{
     "<span>count</span>": <span><span>2</span>  }</span>}</span>
登录后复制

以上只是简单的统计实现,你可以实现复杂的条件统计编写复杂的reduce函数,可以增加查询条件,排序等等。附上mapReduce数据库处理函数(简单封装)
/<span>**</span>
     * mapReduce分组
     * 
     * <span>@param</span> string <span>$table_name</span> 表名(要操作的目标集合名)
     * <span>@param</span> string <span>$map</span> 映射函数(生成键&#20540;对序列,作为 reduce 函数参数) 
     * <span>@param</span> string <span>$reduce</span> 统计处理函数
     * <span>@param</span> array  <span>$query</span> 过滤条件 如:array(<span>'uid'</span>=><span>123</span>)
     * <span>@param</span> array  <span>$sort</span> 排序
     * <span>@param</span> number <span>$limit</span> 限制的目标记录数
     * <span>@param</span> string <span>$out</span> 统计结果存放集合 (不指定则使用tmp_mr_res<span>_</span><span>$table_name</span>, <span>1.8</span>以上版本需指定)
     * <span>@param</span> bool   <span>$keeptemp</span> 是否保留临时集合
     * <span>@param</span> string <span>$finalize</span> 最终处理函数 (对reduce返回结果进行最终整理后存入结果集合)
     * <span>@param</span> string <span>$scope</span> 向 <span>map</span>、reduce、finalize 导入外部js变量
     * <span>@param</span> bool   <span>$jsMode</span> 是否减少执行过程中BSON和JS的转换,默认true(注:false时 BSON-->JS-->map-->BSON-->JS-->reduce-->BSON,可处理非常大的mapreduce,<span>//true</span>时BSON-->js-->map-->reduce-->BSON)
     * <span>@param</span> bool   <span>$verbose</span> 是否产生更加详细的服务器日志
     * <span>@param</span> bool   <span>$returnresult</span> 是否返回新的结果集
     * <span>@param</span> array	 &<span>$cmdresult</span> 返回mp命令执行结果 array(<span>"errmsg"</span>=><span>""</span>,<span>"code"</span>=><span>13606</span>,<span>"ok"</span>=><span>0</span>) ok=<span>1</span>表示执行命令成功
     * <span>@return</span><span>*/</span>
    function mapReduce(<span>$table_name</span>,<span>$map</span>,<span>$reduce</span>,<span>$query</span>=null,<span>$sort</span>=null,<span>$limit</span>=<span>0</span>,<span>$out</span>=<span>''</span>,<span>$keeptemp</span>=true,<span>$finalize</span>=null,<span>$scope</span>=null,<span>$jsMode</span>=true,<span>$verbose</span>=true,<span>$returnresult</span>=true,&<span>$cmdresult</span>){
    	<span>if</span>(empty(<span>$table_name</span>) || empty(<span>$map</span>) || empty(<span>$reduce</span>)){
    		<span>return</span> null;
    	}
    	<span>$map</span> = new MongoCode(<span>$map</span>);
    	<span>$reduce</span> = new MongoCode(<span>$reduce</span>);
    	<span>if</span>(empty(<span>$out</span>)){
    		<span>$out</span> = <span>'tmp_mr_res_'</span>.<span>$table_name</span>;
    	}
    	<span>$cmd</span> = array(
    			<span>'mapreduce'</span> => <span>$table_name</span>,
    			<span>'map'</span>       => <span>$map</span>,
    			<span>'reduce'</span>    => <span>$reduce</span>,
    			<span>'out'</span>		=><span>$out</span>
    	);
    	<span>if</span>(!empty(<span>$query</span>) && is_array(<span>$query</span>)){
    		array_push(<span>$cmd</span>, array(<span>'query'</span>=><span>$query</span>));
    	}
    	<span>if</span>(!empty(<span>$sort</span>) && is_array(<span>$sort</span>)){
    		array_push(<span>$cmd</span>, array(<span>'sort'</span>=><span>$query</span>));
    	}
    	<span>if</span>(!empty(<span>$limit</span>) && is_int(<span>$limit</span>) && <span>$limit</span>><span>0</span>){
    		array_push(<span>$cmd</span>, array(<span>'limit'</span>=><span>$limit</span>));
    	}
    	<span>if</span>(!empty(<span>$keeptemp</span>) && is_bool(<span>$keeptemp</span>)){
    		array_push(<span>$cmd</span>, array(<span>'keeptemp'</span>=><span>$keeptemp</span>));
    	}
    	<span>if</span>(!empty(<span>$finalize</span>)){
    		<span>$finalize</span> = new Mongocode(<span>$finalize</span>);
    		array_push(<span>$cmd</span>, array(<span>'finalize'</span>=><span>$finalize</span>));
    	}
    	<span>if</span>(!empty(<span>$scope</span>)){
    		array_push(<span>$cmd</span>, array(<span>'scope'</span>=><span>$scope</span>));
    	}
    	<span>if</span>(!empty(<span>$jsMode</span>) && is_bool(<span>$jsMode</span>)){
    		array_push(<span>$cmd</span>, array(<span>'jsMode'</span>=><span>$jsMode</span>));
    	}
    	<span>if</span>(!empty(<span>$verbose</span>) && is_bool(<span>$verbose</span>)){
    		array_push(<span>$cmd</span>, array(<span>'verbose'</span>=><span>$verbose</span>));
    	}
    	<span>$dbname</span> = <span>$this</span>->curr_db_name;
    	<span>$cmdresult</span> = <span>$this</span>->mongo-><span>$dbname</span>->command(<span>$cmd</span>);
    	<span>if</span>(<span>$returnresult</span>){
    		<span>if</span>(<span>$cmdresult</span> && <span>$cmdresult</span>[<span>'ok'</span>]==<span>1</span>){
    			<span>$result</span> = <span>$this</span>->find(<span>$out</span>, array());
    		}
    	}
    	<span>if</span>(<span>$keeptemp</span>==false){
    		<span>//</span>删除集合
    		<span>$this</span>->mongo-><span>$dbname</span>->dropCollection(<span>$out</span>);
    	}
    	<span>return</span><span>$result</span>;
    }
登录后复制

MongoDB官方网站介绍:MapReduce介绍  http://docs.mongodb.org/manual/core/map-reduce/Aggregation介绍  http://docs.mongodb.org/manual/aggregation/ 

以上就介绍了mongodb的mapreduce用法及php示例代码,包括了方面的内容,希望对PHP教程有兴趣的朋友有所帮助。

PHP速学教程(入门到精通)
PHP速学教程(入门到精通)

PHP怎么学习?PHP怎么入门?PHP在哪学?PHP怎么学才快?不用担心,这里为大家提供了PHP速学教程(入门到精通),有需要的小伙伴保存下载就能学习啦!

下载
来源:php中文网
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系admin@php.cn
最新问题
开源免费商场系统广告
热门教程
更多>
最新下载
更多>
网站特效
网站源码
网站素材
前端模板
关于我们 免责申明 举报中心 意见反馈 讲师合作 广告合作 最新更新 English
php中文网:公益在线php培训,帮助PHP学习者快速成长!
关注服务号 技术交流群
PHP中文网订阅号
每天精选资源文章推送
PHP中文网APP
随时随地碎片化学习

Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号