Archiving the Web

It’s very puzzling that countries that recognize the importance to heritage of national archives of print, don’t invest in the preservation of digital information. Ron Miller at TechCrunch laments the near total lack of attention to this by governments and companies.

The Internet Is Failing The Website Preservation Test, TechCrunch (Aug 27)

The exception to this is the Internet Archive which does a heroic job of saving copies of web pages and has been a godsend for thousands of researchers. But it is not complete, is very difficult to search, and can’t possibly serve as a “system of record”.

Miller calls for a more embracing system – though this will never happen until there is universal recognition of the importance of preserving the digital record.

“While all of these are game attempts, what we really need is an organization backed by the internet community with funding and tools to formalize the archiving and preservation process. This would allow every website that wants to participate to easily back up their sites to a central repository with a few lines of code.”

Emoticons for punctuation

Emoticons – and now emojis – have a long history, as Reid Goldsborough explains in this LinkUp article — The How’s and Why’s of Emoticons (Sept).

The Emojipedia tells us that “Emoji originated in Japan. The word emoji means “picture letter” in Japanese. Each character has an official name, defined as part of the unicode standard.” We are most familiar with the round, yellow people faces with various expressions.

All these and the emoticons of early Internet days are used still in email, text chat, texting – anything text.

Of interest is that they predate the Internet — “Telegraph operators in the mid-19th century used acronyms such as IMHO (in my humble opinion) and FWIW (for what it’s worth) when communicating among themselves, according to the book The Victorian Internet by Tom Standage. Later, teletype operators used emoticons when chatting. In both cases it was to save time.”

Canadian Politics – Historical resesarch

Good thing someone is archiving the words and actions of Canadian politicians. The University of Waterloo has made political pages from the past 10 years searchable at webarchives.ca — “political parties and political interest groups” (though when I tried the server was down).

Digital archive of political parties digs deep for Election 2015 (Aug 26)

“WebArchives.ca pulls from collections that the University of Toronto Library has been collecting for a decade. Professor Milligan and his research team at Waterloo, as well as project collaborators from York University and Western University made the data searchable and accessible, drawing on code that staff at the British Library developed.”

Thanks to ResearchBuzz for the lead.

Also see Waterloo professor restores deleted political platforms, promises, CBC (Aug 27)

Canadians wishing to distinguish between truth and falsehood in the statements of our campaigning politicians will also get great value from FactsCan.ca – “Canada’s political fact-checker. Independent. Transparent. Non-partisan”

Best browser today

Browsers are not all the same. This PCWorld article puts current browsers (Microsoft Edge, Internet Explorer 11, Chrome, Firefox, Opera) through tests for speed, resource usage, and function.

The best web browser of 2015: Firefox, Chrome, Edge, IE, and Opera compared, Mark Hachman (Aug 21)

A key difference is in consumption of memory and CPU. Both usage figures go up as more tabs are added. Chrome is noted for “sucking up” memory. Having Flash enabled can worsen usage, especially in Firefox. Opera was the most efficient. All browsers pass the benchmark tests. The To note – Flash will slow down a browser. Hope more sites convert to HTML5. All provide ample function for viewing web pages, but Firefox has the lead in number of plugins available.

Of interest: “Firefox includes a Firefox-to-Firefox videoconferencing service called Firefox Hello that works right in your browser, and you can save webpages to a Pocket service for later reading. And this is where Edge shines—its digital assistant, Cortana, is built right in, and there are Reading View options and a service to mark up webpages, called Web Notes. Cortana does an excellent job supplying context, and it’s certainly one of the reasons to give Edge a try.”

Check article to learn which browser was considered best.

Searching out Capitalization

Learn more about search strategies from Dan Russell in his description of approaches taken to Answer: Why all the crazy capital letters? (Aug 25)

Fascinating reading, especially if you have wondered, as I have, why writers in the 1930s and earlier capitalized so many words in documents and correspondence. Uppercase adds importance especially to topics, anything related to religion, positions, titles. However, today the rule is to limit capitalization.

New from Mary Ellen Bates

Searchers will be interested in the two new slideshows Mary Ellen Bates has posted at SlideShare in advance of sessions Web Search University in September 2015. Excellent.

Social Media Gains Respectability – primer on the value, how to use, how to search, and how to protect privacy.

Competitive Intelligence for non CIers – what it is, how to do it. Has strategies and tools.

DMOZ shows in search results

Can Open Directory (aka Dmoz) make a comeback? Very few people use  subject directories anymore. But this one has hung on, and John E Lincoln at Ignite sees signs of life — IS DMOZ MAKING A COMEBACK FOR SEO? NEW SIGHTING (Aug 24)

Of interest: “Now we know that Wikipedia has been losing traffic and Google is looking for other sites to fill the void. Perhaps Google is looking to other properties for information and authority. Either way, this new development has SEO professionals looking at DMOZ again.” And so, perhaps, should web searchers.

Images in the Knowledge Graph

There is more than meets the eye in Google’s algorithms for choosing images to show in the Knowledge Graph. Bill Slawski gives a summary of the patent.

How Google Decides Which Images To Show For Entities In Knowledge Panels, SEO by the Sea (Aug 16)

“The combination of image scores and quality scores for web pages that contain images of entities might be used to generate an image authority score.”

More new domain suffixes in use

The new personalized top=level domain names are taking hold. It seems more than 6 million new suffixes have been registered – .vegas, .jobs, .bike – could be anything.

Who needs .com? Domains like .vegas, .pr, .nyc are trending, Joyce Rosenberg, AP via Seattle Times (Aug 19)

The original set of suffixes had meaning – .com, .org, .edu – that searchers could exploit. The new ones have no pattern. But it might also be true that only these cute names will be used by smaller operators.