Searching is more than googleing
surfraw is a tool to build great "scrapers". A scraper is a tool to extract content from the web automatically. It gets tricky when it comes to modern web-apps which embed content dynamically, but in case of the following it's "just" searching. The art of searching (presentation of Fravia at 22c3, html5 embedded) isn't widely known. - I think especially people in IT should train that ability or being taught: because the job always requires to search for specific information.
Every day there's something you need to google, bing, yahoo, use $search_engine for. Thing is: often these search engines do not work as you might expect. They're all commercial. They're all censored. And they're all putting those results that pay prior to those that don't. - Nevertheless which content or which context.
De-Crapification
The web is full of crap. To efficiently use it you need to filter.
It's like an abstracted gold-mine these days; like a digitalized Wild-West from the good old western movies. The better you're at digging, the richer you'll get. You can find everything! Information gathering can even be used as a supporting element for social engineering, but that's not what this article is about. It's just about finding stuff in no time. With surfraw.
Let's stay at the command-line. I don't know what Shell you use, but I'm with ZSH. If you're into BaSh it should work in the same way:
- #!/bin/zsh
- lynx -dump $argv[1]
Lynx has an interesing feature, which is to dump the web-text into stdout. Save that snippet as an executable in /usr/bin/dlynx or within your PATH.
- surfraw -browser="dlynx" google surfraw | head -n 50
Now run that and you'll know what surfraw is about :).
2. [19]Surfraw - Wikipedia, the free encyclopedia
According to its creator Julian Assange: "Surfraw provides a fast
unix command line interface to a variety of popular WWW search
engines and other artifacts ...
Not too bad!
Surfraw has support for a great number of indexes: from NASA technical reports to TPB, from springer to opensearch. It's a very powerful tool. Surely you can define another "-browser" option. Dillo is very neat.
- Or, on osX "open" for your default browser of choice, which hopefully has an ad-blocker and script-controlls. Dillo works very well on X11 btw.
- surfraw -browser="dlynx" google -country=DE surfraw | head -n 50
- To keep it German.
Use them all!
There's a large number of elvis. Take this for example:
- surfraw -browser="dlynx" freedb "pornophonique"
Take a look at the -help:
- Local options:
- -artists Search artists
- Environment: SURFRAW_cddb_artists
- -albums Search albums
- Environment: SURFRAW_cddb_albums
- -songs Search songs
- Environment: SURFRAW_cddb_songs
- -rest Search the rest of the data
- Environment: SURFRAW_cddb_rest
- -all Search all fields
- Environment: SURFRAW_cddb_all
- Default: search artists and albums
- -id Search by CDDB ID.
- -bycat Sort results by category
- -cat=CATEGORY Category to search, repeat as needed
- Options:
- all
- blues
- classical
- country
- data
- folk
- jazz
- misc
- newage
- reggae
- rock
- soundtrack
- Default: all
- -page=PAGENUM Start at page PAGENUM
- Default: 1
So now I'm playing: "pornophonique & procacci - i want to be a machine" - I think you got what this is about and find some inspiration on your own without having to cope with all the strange web-interfaces, ads and stuff.
Good luck finding stuff,
wishi



Post new comment