Coreseek®  
 | 首页 | 注册 | 回复 | 搜索 | 统计资料 |                 网站首页产品服务开放源码安装使用常见问题中文手册社区交流联系我们 
中文分词 论坛首页 / 中文分词 /

group by使用疑惑,有使用经验的分享一下

 
terrygyd
会员
#1 | 发表时间: 2010 08 25 17:30
回复 
现遇到的问题:
不使用groupby 'tid'的结构集正常,我的目的是为了去重tid 使之唯一性
Array
(
    [error] =>
    [warning] =>
    [status] => 0
    [fields] => Array
        (
            [0] => subject
            [1] => content
        )

    [attrs] => Array
        (
            [tid] => 1
            [fid] => 1
            [authorid] => 1
            [postdate] => 2
        )

    [matches] => Array
        (
            [0] => Array
                (
                    [id] => 442194
                    [weight] => 2
                    [attrs] => Array
                        (
                            [tid] => 47220
                            [fid] => 64
                            [authorid] => 6557
                            [postdate] => 1282093487
                        )

                )

            [1] => Array
                (
                    [id] => 436380
                    [weight] => 2
                    [attrs] => Array
                        (
                            [tid] => 45968
                            [fid] => 24
                            [authorid] => 19694
                            [postdate] => 1281936645
                        )

                )
          [2] => Array
                (
                    [id] => 427360
                    [weight] => 2
                    [attrs] => Array
                        (
                            [tid] => 45968
                            [fid] => 24
                            [authorid] => 2400
                            [postdate] => 1281582950
                        )

                )
         ............


使用group by tid以后的结果集始终只有1条记录

Array
(
    [error] =>
    [warning] =>
    [status] => 0
    [fields] => Array
        (
            [0] => subject
            [1] => content
        )

    [attrs] => Array
        (
            [tid] => 1
            [fid] => 1
            [authorid] => 1
            [postdate] => 2
            [@groupby] => 1
            [@count] => 1
        )

    [matches] => Array
        (
            [0] => Array
                (
                    [id] => 442194
                    [weight] => 2
                    [attrs] => Array
                        (
                            [tid] => 47220
                            [fid] => 64
                            [authorid] => 6557
                            [postdate] => 1282093487
                            [@groupby] => 19700101
                            [@count] => 675
                        )

                )

        )
看了手册 sphinx的group by等同于mysql 的group by 为什么出来的结果确是这样?
HonestQiao
会员
#2 | 发表时间: 2010 08 25 17:46
回复 
请给出你的具体操作代码。
terrygyd
会员
#3 | 发表时间: 2010 08 25 18:36
回复 
以下代码是逻辑处理
$sphinx = $this->_getSphinx();
list($host,$port) = $this->_getConfig();
$sphinx->SetServer ( $host, $port );
$sphinx->SetConnectTimeout ( 1 );
$sphinx->SetMatchMode ( $this->_getMode($method,$q) );
$digest && $sphinx->SetFilter ('digest',array(1,2));
$fid && $sphinx->SetFilter ('fid',$fid,$exclude);
$authorids && $sphinx->SetFilter ('authorid',$authorids);
if($sch_timemin && $sch_timemax){
      $sphinx->SetFilterRange('postdate',$sch_timemin,$sch_timemax);
}
$groupby && $sphinx->SetGroupBy ( $groupby, $this->getGroup(), "@group desc" );
$sortby && $sphinx->SetSortMode ( ($asc=='DESC' ? SPH_SORT_ATTR_DESC : SPH_SORT_ATTR_ASC), $sortby );
$page = isset($_GET['page']) ? $_GET['page'] : 1;
$page = max(1, intval($page));
$start_limit = intval(($page - 1) * $db_perpage);
$sphinx->SetLimits ( $start_limit, intval($db_perpage), ( $db_perpage>1000 ) ? $db_perpage : 1000 );
$sphinx->SetRankingMode ( $this->getRanking() );
$sphinx->SetArrayResult ( true );
$index = $this->getIndex($index);
$result = $sphinx->Query ( str_replace('|',' ',$q), $index );
跟踪到内核的Query函数 ->AddQuery 函数里的
的$req .= pack ( "NN", $this->_groupfunc, strlen($this->_groupby) ) . $this->_groupby;代码值都没什么问题
$this->_groupfunc = SPH_GROUPBY_ATTR
$this->_groupby = tid
以及$this->_groupsort = @group desc
terrygyd
会员
#4 | 发表时间: 2010 08 25 18:46
回复 
在分析下去就碰内核了。sphinx group by的作用是否真的可以这样使用的?还是我会错了他的意?如果有使用过的人能否贴一下你的成功使用后代码,当然conf最好也贴一下。
HonestQiao
会员
#5 | 发表时间: 2010 08 25 20:37
回复 
1. 不使用group,返回结果有total_found属性吗?
2. 使用之后,还有吗?
terrygyd
会员
#6 | 发表时间: 2010 08 26 20:39
回复 
total_found 使用后和使用前都有
但是使用后的值就变为1了
使用前应该是总记录数结果
HonestQiao
会员
#7 | 发表时间: 2010 08 27 11:56
回复 
我建议先直接写最简单的代码,把你的其他可能导致错误的逻辑都去掉。

去掉排序、翻页与限制,仅查询和分组。
fl_dream
会员
#8 | 发表时间: 2010 09 14 16:14
回复 
this->sphinx->SetGroupBy( "class", SPH_GROUPBY_ATTR, "@group desc" );
$res = $this->sphinx->Query($this->query, 'all');
print_r($res);

我的就这么简单的查询,分组后出现的也是这个现象,请教下原因。手册我看了N++次了。。

同上描述:
total_found 使用后和使用前都有
但是使用SetGroupBy后的值就变为1了
使用前应该是总记录数结果
HonestQiao
会员
#9 | 发表时间: 2010 09 14 17:05
回复 
使用我们提供的下载演示程序,实际测试结果如下:
//$cl->SetGroupBy( "author_id", SPH_GROUPBY_ATTR, "@group desc" );
$res = $cl->Query ( '一个', "*" );
//print_r($cl);
print_r($res);

    [matches] => Array
        (
            [0] => Array
                (
                    [id] => 3
                    [weight] => 1319
                    [attrs] => Array
                        (
                            [published] => 1270094460
                            [author_id] => 2
                        )

                )

            [1] => Array
                (
                    [id] => 1
                    [weight] => 1252
                    [attrs] => Array
                        (
                            [published] => 1270131607
                            [author_id] => 1
                        )

                )

            [2] => Array
                (
                    [id] => 2
                    [weight] => 1252
                    [attrs] => Array
                        (
                            [published] => 1270135548
                            [author_id] => 1
                        )

                )

        )

    [total] => 3
    [total_found] => 3
    [time] => 0.001
    [words] => Array
        (
            [一个] => Array
                (
                    [docs] => 3
                    [hits] => 5
                )

        )
HonestQiao
会员
#10 | 发表时间: 2010 09 14 17:06
回复 
使用我们提供的下载演示程序,实际测试结果如下:
$cl->SetGroupBy( "author_id", SPH_GROUPBY_ATTR, "@group desc" );
$res = $cl->Query ( '一个', "*" );
//print_r($cl);
print_r($res);

    [matches] => Array
        (
            [0] => Array
                (
                    [id] => 3
                    [weight] => 1319
                    [attrs] => Array
                        (
                            [published] => 1270094460
                            [author_id] => 2
                            [@groupby] => 2
                            [@count] => 1
                        )

                )

            [1] => Array
                (
                    [id] => 1
                    [weight] => 1252
                    [attrs] => Array
                        (
                            [published] => 1270131607
                            [author_id] => 1
                            [@groupby] => 1
                            [@count] => 2
                        )

                )

        )

    [total] => 2
    [total_found] => 2
    [time] => 0.002
    [words] => Array
        (
            [一个] => Array
                (
                    [docs] => 3
                    [hits] => 5
                )

        )
ghostwwl
会员
#11 | 发表时间: 2010 11 02 16:40
回复 
$result = array();
                $result['words'] = $res['words'];
                if (is_array($res['matches']))
                {
                    //搜索结果的文档数目
                    $result['total'] = intval($res['total']);
                    $result['total_found'] = intval($res['total_found']);
                    //搜索用时
                    $result['time'] = $res['time'];
                    //分词情况和每个词的命中信息 文档数目
                    //存放搜索结果  每个元素为文档id 文档权重
                    $result['result'] = array();
                    //存放分组结果
                    $result['groupinfo'] = array();
                    foreach ( $res['matches'] as $docinfo )
                    {
                        //获取文档id和权重
                        $result['result'][$docinfo[id]] = $docinfo[weight];
                        //获取分组值和分组数目
                        $result['groupinfo'][$docinfo['attrs']['@groupby']] = $docinfo['attrs']['@count'];
                     }
                     return $result;
                }
                else
                {
                    //这个是没有找到结果的情况
                    $result['total'] = 0;
                    return $result;
                }
 
回复
Bold Style  Italic Style  Image 链接  URL 链接 
发帖注意:
  • 网址中请去掉http://开头,例如:您需要输入www.coreseek.cn,而不是http://www.coreseek.cn
  • 咨询问题,请贴出详细的操作系统版本、Coreseek版本(Linux环境请给出编译参数)
  • 请仔细查看中文手册和本站安装指南,确认操作正确
  • 请仔细查看常见问题解答,也许你的问题已经有解决方法

» 帐号  » 密码 
发帖前请登陆, 或者 注册 .