RSS Feed

a playground of art, photos, videos, writing, music, life

 


You are here







Random Quote

All of the books in the world contain no more information than is broadcast as video in a single large American city in a single year. Not all bits have equal value.
-- Carl Sagan



Blog Posts for "programming"

Page Through Blog: Home Page

Blog Archive by Month | Blog Archive by Story or Tag | Search Blog and Comments

Seek

 

I finished my blog's search engine. It synthesizes both posts and comments into an integrated search, plus allows filtering by media (pictures, video, and sound files) and a search by commenter.

Personally, I like the search results. If there's a picture in the post, it grabs it and shows that along with the initial text in the post.

You can check it out here.

 

0 Comments
Read the whole story of "Building My Own Blogger"
Tags: programming
by Brett Rogers, 6/12/2007 6:49:24 PM
Permalink


   

And Now, the Weather Report...

 

Blogging is going to be light in the next few weeks as work is heating up for me. In addition to moving from being an ASP developer to an ASP.NET developer, the scale of the projects in which I'll be involved is rather large. Sadly, I have little time to paint, but I'll continue my lunch break painting.

And for you geeks out there: I'm a very fast ASP developer. This site is a good example of that. I spent less than 40 hours developing the blogging backbone of beatcanvas. But I'm scratching my head a bit about .NET. Where's the promised speed increase? I understand that the DataGrid offers some benefits, but I find it more limiting than freeing, particularly when there's a need to display data from more than one recordset/dataset. And the validation controls seem very clumsy as well. For example, tabbing out of a textbox that has a Required validation tied to it but has no text in it doesn't force an error to appear in the Summary control until I press the submit button. That's just dumb. And only one form per page? I'm not a fan of that.

I understand the desire for a single language and the desire for server-side, but I feel more like my hands are tied than they are freed by .NET because of the limitations inherent to the environment.

Everything has pros and cons, I guess. I just expected something more robust.

 

3 Comments
Tags: programming
by Brett Rogers, 6/2/2005 8:21:46 PM
Permalink

   

Fixing the Game

 

I played Masterword (the BeatCanvas.com word game linked on the left) at lunch today and saw that it was broken by a recent change that I made in it. Zoiks! But I've now fixed that glitch.

I also know that if you've bookmarked the game page, it saves it in such a way that you play the same word over and over again. I'm working on a fix for that, but in the meantime, the way to get around that is to come straight to the main beatcanvas.com page and click on the game link. That starts a fresh game.

 

1 Comment
Tags: programming | masterword | word game
by Brett Rogers, 4/22/2005 9:33:16 PM
Permalink


If You Liked That Bit Of Fun...

 

I was surprised to have the response that I did to my de-pluralizer. A few people said that they enjoy that sort of thing, so I've created an online word game for you here at beatcanvas.com. Here's how it works...

beatcanvas.com has a list of 3,777 5-letter words from which to choose. It chooses one of them. You have to guess it by using other 5-letter words. The web site will compare your word against the beatcanvas.com word and then score it for you, by telling you how many letters you have guessed right, and of those, how many are in the right place within the word.

So for example, if the beatcanvas.com word was "LLAMA" and you guessed "LEMON," then you would have 2 letters right (L-M) and 1 in the right place, or 2-1.

Want to play? Go here... you'll be able to go right to it in the future by clicking on "Play a Word Game" in the menu at left.

It's little things like these that are the reason I wrote my own blogger instead of using Blogger itself or Typepad.

 

4 Comments
Tags: masterword | word game | programming
by Brett Rogers, 4/16/2005 5:55:26 PM
Permalink


Alright You Wordsmiths...

 

I wrote my de-pluralization routine today, and I've decided to put it online here. See if you can beat it. Give the engine a plural word and see if it can come back with the right singular form. If it can't, let me know in the comments - and then gloat a lot.

Here's the engine.

 

9 Comments
Tags: programming
by Brett Rogers, 4/15/2005 2:28:59 AM
Permalink


Building a Search Engine

 

I mentioned the other day that I've been tasked at work to build an Intranet search engine. We have a lot of departments in Wells Fargo, where I work as a consultant, and I build database-driven web sites for them. A department came to my group recently and said that they needed a search engine for the nearly 100 PDF's that they have on their Intranet site. Small task, but I knew that if we built it right, other groups would ask for the same thing. We delivered the prototype and others are lining up. Very cool.

Lots of thought goes into a project like this. Given a cool puzzle, I find my head turning it around in my off-hours as well. What goes into a search engine?

First, because we're talking about myriad and large PDF's, I need to convert the PDF's to text because the PDF file itself looks like this:

ÈÍj?ãû&ÈçÇCOÕ5Õë¹Aá;
It's not searchable unless you're in Adobe Acrobat and viewing the document. I found a little conversion tool at VeryPDF. It does batch conversions, so first problem solved.

With the text in hand, it's time to consider how people search. Take this sentence, for example:

The Home Mortgage Assessment Program(TM), developed by Judy Edwards, addresses the needs of all borrowers and co-borrowers alike.
It can't be built to just store all of the text and then allow for phrase searches only. If so, and the user knew it was a mortgage program of some sort and typed in "mortgage program," they'd miss it because the actual phrase is "Mortgage Assessment Program." So it has to be by word.

Simple search engines on some web sites just store all of the words in a document into a big table of words and then return everything with either "mortgage" or "program" in it - not necessarily demanding that both are in it. So the user could be browsing documents with only the word "mortgage" in them and not "program." That's annoying, and wasteful of the user's time.

Next, the words themselves. Take the word "Program" in my example sentence. It's not just "Program" - it's "Program(TM)." So the words have to be analyzed and reduced to their true intent.

Further, the database will return results faster with exact match searches and not wildcard searches. English is a good mountain to climb with that goal in mind. Let's say that a user types "short-term loans" into the search engine. Will all documents have "short-term" hypehnated? Probably not. And even though they typed in "loans," they'll want to find hits for "loan" as well. Today, I find myself researching grammar rules pluralization and rules for possessive nouns so that the engine can "de-plural" words so that users can get the best search results possible.

Rule #1
The plural of nouns is usually formed by adding - s to a singular noun.

Rule #2
Nouns ending in s, z, x, sh, and ch form the plural by adding - es.

Rule #3
Nouns ending in - y preceded by a consonant is formed into a plural by changing - y to - ies.

Rule #4
Nouns ending in y preceded by a vowel form their plurals by adding - s.

Rule #5
Most nouns ending in o preceded by a consonant is formed into a plural by adding - es.

Rule #6
Some nouns ending in f or fe are made plural by changing f or fe to - ves.

In my geeky way, I love this project.

Tuesday, I researched the 1,000 most common words in English and the search engine won't be indexing prepositions, adverbs, articles, conjunctions, and words like those.

And at the moment, I'm studying names... such as "Wells," like in "Wells Fargo." Here are some common names:

SMITH
JOHNSON
WILLIAMS
JONES
BROWN
DAVIS
MILLER
WILSON
MOORE
TAYLOR
ANDERSON
THOMAS
JACKSON
WHITE
HARRIS
MARTIN
THOMPSON
GARCIA
MARTINEZ
If I de-plural normal words, I have to be careful not to de-plural names, such as Williams or Davis or Thomas or Jones, which end in -s.

Once the words ae broken down and indexed, it's time to create the search engine results given to the user. How do you decide which document gets listed at the top of the results?

If I have a document in front of me, I know what's most important to it by what appears first. The main idea is generally given first and then repeated throughout the document. So I'm giving a weight to the words in a document in their order of appearance. If the word "loan" appears 8 times in the upper half of a document, but appears 12 times in only the last part of another document, the first document will get listed first.

Further, people will want to see the context of their search words in the matching documents. So I plan to grab 50 words to the right and 50 words to the left of the first occurence of the word and display that, much like Google does.

I love puzzles like these. I released a prototype yesterday that had everything but the de-pluralization and users are beating up on it. Good response thus far. We'll see how the finished product turns out.

ETC: Found a cool word search engine that does pattern matching, so if you like crossword puzzles and a word is giving you a fit, go here. Do the "Common words only" search.

I was using it to find every plural "*ves" word that came from a singular "*f" word. Found them all!

 

0 Comments
Tags: programming
by Brett Rogers, 4/14/2005 2:58:09 PM
Permalink