GSiteCrawler for Google Sitemap files

Google Sitemaps allows the webmaster to help Google index their pages optimally. The GSiteCrawler will help you generate the best Google Sitemap file for your website. The GSiteCrawler uses different ways to find all the pages in your website and can generate all sorts of files, statistics and more. The sitemaps file format has lately been also adapted by Yahoo! – even MSN/Live.com is pledging it’s support.
Making sitemap files has never been so easy!
The GSiteCrawler is available for free and runs under Windows – all you need is an internet connection and the desire to make the most out of your website!

Do you only need a quick sitemap file? Just follow the steps in the integrated wizard and you’ll have the sitemap file on your server in no time!
Are you looking for more than just a sitemap?
The program also offers tons of options, settings, tweaks, and more – if you want to do more than generate just a simple sitemap file. How about a urllist-file for Yahoo? an RSS feed? a ROR file? a HTML sitemap page? It’s all possible with the GSiteCrawler!

Take a look around the site. We use Google-Groups to discuss the program and possible extensions. If you run across something that you feel is missing, feel free to post in the Google Groups or just send me a short note directly.
GSiteCrawler Features
In general, the GSiteCrawler will take a listing of your websites URLs, let you edit the settings and generate Google Sitemap files. However, the GSiteCrawler is very flexible and allows you to do a whole lot more than "just" that!
Capture URLs for your site using
    * a normal website crawl – emulating a Googlebot, looking for all links and pages within your website
    * an import of an existing Google Sitemap file
    * an import of a server log file
    * an import of any text file with URLs in it
The Crawler
    * does a text-based crawl of each page, even finding URLs in javascript
    * respects your robots.txt file
    * respects robots meta tags for index / follow
    * can run up to 15 times in parallel
    * can be throttled with a user defined wait-time between URLs
    * can be controlled with filters, bans, automatic URL modifications
With each page, it
    * checks date (from the server of using a date meta-tag) and size of the page
    * checks title, description and keyword tags
    * keeps track of the time required to download and crawl the page
Once the pages are in the database, you can
    * modify Google Sitemap settings like "priority" and "change frequency"
    * search for pages by URL parts, title, description or keywords tags
    * filter pages based on custom criteria – adjust their settings globally
    * edit, add and delete pages manually
And you have everything the way you want it, you can export it as
    * a Google Sitemap file in XML format (of course :-)) – with or without the optional attributes like "change date", "priority" or "change frequency"
    * a text URL listing for other programs (or for use as a UrlList for Yahoo!)
    * a simple RSS feed
    * Excel / CSV files with URLs, settings and attributes like title, description, keywords
    * a Google Base Bulk-Import file
    * a ROR (Resources of Resources) XML file
    * a static HTML sitemap file (with relative or absolute paths)
    * a new robots.txt file based on your chosen filters
    * … or almost any type of file you want – the export function uses a user-adjustable text-based template-system
For more information, it also generates
    * a general site overview with the number of URLs (total, crawlable, still in queue), oldest URLs, etc
    * a listing of all broken URLs linked in your site (or otherwise not-accessable URLs from the crawl)
    * an overview of your sites speed with the largest pages, slowest pages by total download time or download speed (unusually server-intensive pages), and those with the most processing time (many links)
    * an overview of URLs leading to "duplicate content" – with the option of automatically disabling those pages for the Google Sitemap file
Additionally …
    * It can run on just about any Windows version from Windows 95b on up (tested on Windows Vista beta 1 and all server versions).
    * It can use local MS-Access databases for re-use with other tools
    * It can also use SQL-Server or MSDE databases for larger sites (requires a seperate installation file).
    * It can be run in a network environment, splitting crawlers over multiple computers – sharing the same database (for both Access and SQL-Server).
    * It can be run automated, either locally on the server or on a remote workstation with automatic FTP upload of the sitemap file.
    * It tests for and recognizes non-standard file-not-found pages (without HTTP result code 404).

Advertisements

About Jaggi
love technology, always updated on the latest and current happenings, seminars, tech.Ed, virtual days! Be Yourself!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: