Skip to content

My PHP , Wordpress and Linux Lab

Archive

Category: Search Engines

Full text searches have been the one part which would always worry me in any project which will hold a reasonable amount of data with a decent traffic.  Not anymore!

I have been using Sphinx for a while and i have been really impressed. Easy deployment, easy integration and its pretty flexible as well. I cant say that i have explored everything it has to offer but whatever I needed it to do, it surely did for me.

Among other features you can use it to rank , filter ,  group , sort and query to targeted columns. It even supports “Sounds like” searches ….. wonderful! .  With several different matching and ranking modes , ones got to be right for you.

The PHP Client is a pure PHP implementation, so no deployment headaches. The PHP client API is included in the download package, hook it up with your DB wrapper and you are ready to battle with those Full Text Searches. What it would do is .. it would return to you the ranked record IDs, along with weights and then you can bring the full record of that ID directly from MySQL. Response times from Sphinx are amazing, So far I haven’t seen it take more than 0.7 seconds to return the resultset. Best of all, you can scale as much as you need, without any problems.

Here are the key features taken from their site directly:

  • high indexing speed (upto 10 MB/sec on modern CPUs)
  • high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
  • high scalability (upto 100 GB of text, upto 100 M documents on a single CPU)
  • supports distributed searching (since v.0.9.6)
  • supports MySQL natively (MyISAM and InnoDB tables are both supported)
  • supports phrase searching
  • supports phrase proximity ranking, providing good relevance
  • supports English and Russian stemming
  • supports any number of document fields (weights can be changed on the fly)
  • supports document groups
  • supports stopwords
  • supports different search modes (”match all”, “match phrase” and “match any” as of v.0.9.5)
  • generic XML interface which greatly simplifies custom integration
  • pure-PHP (ie. NO module compiling etc) search client API

Amazing stuff! So if you are in the business of developing high volume traffic and/or data sites … you have to check it out for sure! Its available for both Linux and Windows platforms!

2 Thumbs Up for these guys!

Google , who is the loser?

Mar 1
Posted by : Sabeen Malik in Search Engines, Web World

Over the years I have seen Google emerge out of nothing and rule the planet when it comes to search. But I have also seen lots of junk in my searches and I have seen a thousands of requests for Search Engine Optimization and ads by Search Engine Optimizers claiming to put your site on top of search results for X amount of dollars in Y number of days.

Now probably one of the biggest industries around on the internet is Search Engine Optimization. My understanding is that probably the BEST content on the internet comes from personal small time sites. For instance take look at my blog , no bullshit pure content BUT i will never be able to score high on Search Engines. Why? I DONT KNOW!

Isnt it all about content? Honestly, NO! . Its more about what optimization techniques you follow on your site and what keywords you target. Having said that keyword targeting is earning alot of people alot of money and alot of crap sites are getting loads of hits just because of that and probably alot of great sites with alot more relevant information regarding that keyword come on page 10 , 11 or whatever.

So what does that mean , that means if you want to have real Internet presence and your intentions are all good and you are a guy with principals who doesn’t want to use black hat methods , i have bad news for you. That aint gonna happen! Atleast not the way things are going.

Take for instance the Google Bomb concept , it works and it works well. Page cloaking is another well known technique. Getting these methods to help will cost you $$$$ and some SEO expert is going to get rich. There are also magic ebooks out there which tell you techniques to get on top of search engines at a ’small’ price. OK hold on … I thought the internet was supposed to be an equal oppurtunity domain , where everyone is equal BUT where Google has helped us ALOT it has also created a division in the internet society. Even on the internet the rich guys with pots of money for Search Engine Optimization and Marketing get on the top and the not so rich are always in shadows. Google probably supports the ’secret’ black hat community cause they dont seem to be doing much about it. Lets take the example of Sponsored Ads in Google , have they ever filtered what is actually shown on that right column? I see the same ad on the right side for some searches with something like “Looking for XYZ?” and its the same ad .. with ebay selling me PHP and God knows what.

The point is Google has knowingly or unknowingly put all the good webmasters in a fix. “What should we do to be atleast marginally visible on Google?” . Instead of them concentrating on content , products or services they are more worried about the fact if its even worth the effort. Probably Google should rethink their strategies , if they want to focus on the 1% powerfull people on the internet or the 99% not so powerful people around there.

Probably one of the best things that happened on the internet indexing wise was www.dmoz.org ,
“hand pick the sites that should be included”. Now i am not saying Google should hand pick everything BUT atleast they should be checking what is actually showing on top searched keywords and also i also have another suggestion , there are so many keywords where they have nothing to show in that money column of theirs. What if they started to show some good sites in those spots for free , now probably they can handpick lets say 1000 sites each month and rotate them through in those spots? But anyway the point is , where are we going with Google , by WE i mean the not so powerful lot on the internet, the honest  webmasters! . We are the losers in the game cause we believe that our content , service or product will get us on the top but thats dreamland , hope Google can squeeze us in there somewhere on their money pages.