ScienceSeeker Seeks Science-Savvy Editors

ScienceSeeker is now one year old, and we’ve made some great strides in the past year. But in the next year, we’re planning even more. We’re about to get a whole lot more interactive, and we need people manage all that interactivity.

With over 900 blogs and hundreds of posts indexed each day, ScienceSeeker can sometimes be rather overwhelming. To help visitors sort through all that information, we’ll be relying both on our readers and specialized volunteer editors. We expect this new functionality to be ready within a few weeks, so to prepare for it, we need to sign up editors now.

The primary job of the new editors will be to share their favorite ScienceSeeker posts. They’ll select five posts a week from blogs covering their areas of expertise, and readers will be able to view those posts on our site or subscribe to a feed of just the recommended posts. We expect this to take just a few minutes each day.

Qualifications:

  • An active online presence on a blog and / or social networking sites like Twitter, Google+, and FaceBook
  • We don’t require that our editors have PhDs, but the candidates we select will have demonstrated expertise via their blogging or other publication record.
  • Enthusiasm for science

If you’re interested in the job, please email dsmunger@gmail.com with a paragraph or two about why you would like to be an editor, and provide a link to 2 or 3 of your own online posts about science that you feel are especially good. Link your social media feeds and other relevant sites, if any. You may attach or link to a curriculum vitae.

Editors will be permanently listed on ScienceSeeker.org, so this position will make a great addition to your CV. We will select the new editors by February 1, 2012.

18 Jan 2012: Updated to indicate the editorships are volunteer positions.

Share ScienceSeeker with our new badges

If you like ScienceSeeker, I hope you’ll add one of these badges to your blog. It’s a great way to let your readers know about ScienceSeeker, which, in turn, should point more readers your way. To use the badges, just copy the code beneath the badge you like best and paste it into your blog’s sidebar (for most blogs, using a “text” widget will work fine).

I'm on ScienceSeeker-DNA

<a href="http://scienceseeker.org/"><img src="http://scienceseeker.org/wp-content/uploads/2011/02/sciseekdna.gif" alt="I&#039;m on ScienceSeeker-DNA" title="sciseekdna" width="146" height="44" /></a>

I'm on ScienceSeeker-Microscope

<a href="http://scienceseeker.org/"><img src="http://scienceseeker.org/wp-content/uploads/2011/02/sciseekmicro.gif" alt="I&#039;m on ScienceSeeker-Microscope" title="sciseekmicro" width="146" height="44" /></a>

I'm on ScienceSeeker-Telescope

<a href="http://scienceseeker.org"><img src="http://scienceseeker.org/wp-content/uploads/2011/02/sciseekscope.gif" alt="I&#039;m on ScienceSeeker-Telescope" title="sciseekscope" width="146" height="44" /></a>

If you have any other ideas for badges, or other ways you’d like to share ScienceSeeker with your readers, let us know in the comments.

Questions about ScienceSeeker? (Open forum)

Following the ScienceSeeker launch, I saw various questions about the site on Twitter. If you have questions, ask them here, and we will answer!

Some questions I can answer now: yes, there is a rudimentary API to let people access posts directly; I will post more about that later. Yes, we do want to make the code for the site re-usable by other communities, so it is not limited to science topics in theory (see the job listings — we are looking for someone to help usher the code through an open source release so that others can use it to set up similar sites about other topics).

What else do you want to know? Ask here.

Introducing ScienceSeeker

We’re pleased to announce the unveiling of the product of six months of planning and work by some very dedicated volunteers. ScienceSeeker (at, naturally, Scienceseeker.org) is a beta-level site; a work in progress, but we think it’s a very useful work even as it now stands. The project began as an extension of Science Blogging Aggregated, but quickly grew into an independent site.

The basic concept is simple: Find as many sources of regularly-updated science information as possible, and collect them all in one place. We believe that science blogs are currently the most robust and diverse source of science news, discussion, and commentary. They can offer a measured response to the myriads of self-promotional press releases that clutter newspapers and inboxes. Unfortunately, they are spread all about the internet, in dozens of blogging networks and hundreds, if not thousands, of independent science blogs. These blogs and networks aren’t organized by topic, which makes it difficult for someone looking for latest posts on, say, chemistry.

ScienceSeeker already catalogs over 400 blogs, and is set up so that anyone can add more blogs. Our editors will review any submission to make sure it’s really about science (and not spam), then approve it within 24 hours. Our aim is to be the most comprehensive and useful aggregator of science news, discussion, and commentary anywhere.

Take a look at the site and put it through its paces. We think you’ll agree that it’s one of the most useful and engaging science sites you’ve ever seen.

Click here to visit ScienceSeeker.

ScienceSeeker is an all-volunteer effort, and we intend to make it a formal, open-source project, allowing anyone to contribute enhancements. We have lots of ideas of what to do next, but we want to hear yours too. Feel free to offer suggestions in the comments.

ScienceSeeker needs YOU!

ScienceSeeker is an all-volunteer project, and it’s not finished yet! We are planning big things, but we can only do them with community support. We need both technical help to create the site, and editorial support to maintain quality.

If you’re interested in being an editor, please indicate your interest below or use the contact form to email us.

Our technical needs are more specific; the positions we need to fill are listed below. If you have technical questions, feel free to email jphekman@arborius.net (Jessica Hekman). She will be online during the conference and will try to answer all emails within a few hours (often within minutes).

* DB/MySQL geek

Review the SQL queries in our code base and tighten them up for speed and efficiency. Add indexes to the database to make it run faster. In general, be in charge the efficiency of the (MySQL) database.

* PHP programmer

Write PHP (mostly WordPress plugins) to add new functionality to ScienceSeeker, and to fix bugs. We can use multiple people in this position! Some knowledge of XML is useful but not essential.

* Release engineer

Set up a source code repository (probably subversion). Design and set up a better development environment (beta and live versions of the site). Chaperone an open source release of the code base.

* XML/XSL geek

Much of the behind the scenes work in ScienceSeeker is XML documents, and many of them are transformed to HTML via XSLT. Maintain the XSLT stylesheets and write new ones as needed. Extend the XML schemas (currently informal, but may be formalized in RELAX-NG or XSD if you prefer) to support more functionality as ScienceSeeker expands. This job may or may not overlap with the PHP programming job.

Crowdsourcing request: Help us create a list of blogs for v. 2.0

Update: Thanks to everyone for your help! We’ve finished updating the database. Look here for news on our launch on Saturday, January 15.

In just under three weeks, we’ll be unveiling the beta version of the next generation of this site.

The new site will work very differently from this one; it is a custom-created database that collects information from hundreds—and ultimately thousands of blogs. Users will easily be able to select just the topics they want, instead of seeing posts based on what network they are on. We want the beta site to be usable from day one, but to do that, we need some help.

I’ve created a Google Docs Spreadsheet for this purpose. Anyone can access the spreadsheet and make modifications. What we need are the name, URL, RSS address, and topic of each blog. What we have, in most cases, is just the URL. If everyone pitches in and visits 10 to 20 blogs, then we should be able to generate this information in a matter of days, if not hours.

Most of the blogs are listed on the Master Blog List (the first tab at the bottom of the spreadsheet). To start helping, just fill in the information in the space provided. If you figure out an automated way of doing this, you can reserve a block of blogs by typing your name in the designated column; then no one will duplicate your efforts.

The reason we need humans to do this is that we want the blogs to be classified by topic. We’ve generated a list of topics (on the last tab in the spreadsheet). When you visit a blog, figure out what topic from our list best describes the blog, and enter it in the space provided (most web browsers will display a drop-down menu to make this easy for you).

The other tabs are for blog networks that are a little more difficult to suss out; either there was no easy way for us to find a list of blogs, or there are non-science blogs mixed in with science blogs. So, we’ve given specific directions for what to do in each case.

FAQs

  • My Blog Isn’t Listed!
    Don’t worry! Either we’ve already got all the info we need (in the case of some blog networks) or you’re an independent blogger and you’ll be able to register your blog when the site launches. If you don’t think you’re in either of those camps, let us know in the comments below
  • None of the official topics apply to this blog
    Just pick the closest match. You can get more specific in the secondary topic
  • I don’t agree with your list of topics
    We had to start somewhere. The list will be easily modifiable in the future.
  • One of the listed blogs is not scientific
    Explain your objection in the Notes section on the spreadsheet
  • Someone has reserved a block of blogs for hours
    You can use File –> See revision history to see how recently an update was made. If it’s been more than an hour, feel free to delete their name, substitute yours, and work on that entry
  • There’s no drop-down menu of topics
    Try using a different browser. I’ve tested it on Safari and Firefox, but I can confirm it doesn’t work on Chrome for Mac.
  • What’s in it for me?
    Our eternal gratitude? Plus, if we see you at a conference, we’ll buy you a beer

Thanks again. Let us know if you have any other questions in the comments.

Here’s another link to the Google Docs Spreadsheet

Next steps for the new site

We’re now beginning work on the new version of the Science Blogging Aggregated site.

We’d like to have a working prototype of the site ready for the ScienceOnline conference in January.

Realistically, by then we’ll probably be able to implement the following features:

  • Users login and register blogs
  • Some sort of administrative check-off on registration, with anti-spam measures
  • Aggregator compiles entries from registered blogs, displays on home page
  • No tagging of individual posts, but blogs are categorized by user-specified “themes”
  • Visitors can filter posts appearing on home page by theme

We may also add a language filter allowing users to specify their preferred languages. (This may be difficult to implement because it would require having curators in each language we support) Over the long term, we would like a multi-lingual interface, so all users can experience the site fully in their native language.

We are leaning towards a dense, information-rich layout for the home page, much like the existing home page, but with additional tools for users to filter posts, login, register, and so on.

In order to maximize the site’s utility, we are thinking about pre-populating the database. This would probably be a manual process, based on the existing feeds for ScienceBlogging.org. This would require an additional feature so that users could “claim” their blog and personalize their account. However, we’re not sure that’s doable by the January deadline. If readers can suggest models for how claiming a blog could work, with a minimum of fuss, we’d appreciate suggestions.

We are also considering a a new domain name for the site—we’d like it to be a truly notable name, one that’s memorable, says something about the site, and isn’t easily confused with some of the other science sites currently out there.

So here’s our plan for the next steps. We’ll keep you up to date as we continue to work on the project:

  1. Develop a schema for a database that can handle the trimmed-down version of the site that we’re planning for January, but is flexible enough to meet our long-term goals
  2. Arrange for site hosting. We can work on our existing personal server space for now but we’ll need a permanent home, and the sooner we find it the better.
  3. Wireframe the first (limited-feature) version of the site: Create a template that developers can use to build the system, indicating what information will go on each page. Again, we may want to do this in anticipation of the higher-functionality site to come, so we don’t have to constantly reinvent the wheel.
  4. Explore the process of creating a non-profit organization. This may be a larger non-profit that also includes ScienceOnline.
  5. Create a schedule for the process of developing the site up through the conference.
  6. Recruit additional help. We’re really short on programmers and designers. Any volunteers?

An outline for version 2.0 of the site

A few weeks ago I wrote up a tentative outline for the next generation of Science Blogging Aggregated. I’ve been sharing bits and pieces of it with you over the past week, but now I’d like to share the whole thing. It’s still a work in progress, a Google Doc that reflects our current thinking on the project—but of course, something that will continue to be refined as we move forward with the project.

Click here to view the document.

I’ve already tried to incorporate as many as possible comments from readers as I’ve shared the plans with you, but of course we continue to be open to additional suggestions. I think this is enough for us to use to get started, but there’s obviously much work still to be done. If you’d like to help out, you can either email us directly at contact@scienceblogging.org, or add a comment below and we’ll get in touch with you via the (hidden) email link you provide in the comment form. Particularly useful at this stage are people with CSS / web design experience, developers, and sysadmins.

We’re hoping to present a working prototype of the site at Science Online 2011. I’ve suggested a session on the conference wiki here.

We’ll continue to keep you posted and ask for your advice and suggestions as work progresses.

Tagging strategies

Dave’s earlier posts sparked some good conversation about tagging. Here is my proposal for how tagging could work on the new version of the site. This proposal isn’t necessarily what we will do; I’m putting it out there to get feedback from the community about whether it’s the right approach.

First, an overview. There are two ways to approach tagging:

  • Folksonomy: all the users use their own tagging schemes. There are tools to let users discover tags already in use.
  • Ontology: the owners of the site describe exactly what tags people can use, and expect people to use them.

Our goals are also twofold:

  • To help  readers of science blogs more easily find the content they are looking for, and
  • To do so without imposing constraints on the authors of science blogs

I believe that folksonomies are the best solution to the above dilemma: they impose no constraints on authors; and, if things are done right, hopefully many of the tags will start to come together. My suspicion is that if we specified a strict list of tags, users would not want to use them.

But how to make the folksonomy chaos into something useful? We will maintain adatabase of tags. Each tag’s entry in the database will have (at a minimum — this can be expanded later):

  • Name of tag (e.g., “tamarin”)
  • List of synonymous tags (“Saguinus”, maybe “tamarind” if we want to support common mistakes)
  • List of children tags (“cotton top tamarind”, “cotton top”, “Saguinus oedipus”, etc — may be very long)
  • List of parent tags (“New World monkeys” — may be multiple)

Bloggers may tag with any of the synonymous tags. Let’s say we do decide to support mistakes. Someone may tag “tamarin” or “tamarind”. Those are different tags, but our system understands that they are synonymous.

Someone searches for “tamarin.” They get a list of posts tagged with either “tamarin” or any of the synonymous tags (so “tamarind” or “Saguinus”).

So what are some problems which might arise?

What if one tag is used for two entirely separate things?

A physics blogger uses “charm” to describe a kind of quark. An anthropologist uses “charm” to describe something used medicinally by a tribe of primitive people. A user searching for “charm” will get both.

I submit that this isn’t a huge problem. It isn’t going to happen all that often. When it does, in almost all cases, the user will be able to refine their search to say “I am only interested in ‘charm’ tags used on blogs with a ‘physics’ theme.” It will be annoying to the people who want to see what the parent/children tags are for “charm,” because they’ll get a weird mix of physics and anthropology subjects. But I think it is not going to happen often enough to really be annoying (and it is better than the alternative of trying too hard to control things).

Sounds like a lot of work to input parent/children/synonym relationships!

Yes. We will have to start with no relationships at all — just a big flat list of tags. Eventually, each subject area will have one or more curators who help manage it. Part of their jobs may be to input relationships for tags in their areas. We will have to make a user interface to make this very easy. Perhaps we will build a user interface to allow users to suggest the addition of new relationships, as well.

The point is that we can do this very gradually. The system will start working immediately, and then be improved with time.

What about brand new tags (“pepsi-gate” vs “pepsigate”)? How can curators possibly keep up with that?

In that case, I believe that the crowd will start to converge, if a) we provide incentives to use the same tags — “if you use the most popular tags, your post will be more discoverable and you’ll get more readers” — and b) we make it very easy for bloggers to find out what the relevant tags are.

Of course, we will provide a list of available tags, organized for readability once we have parent/child relationships. Additionally, we will need a tool to provide tagging suggestions to bloggers while they are writing blog posts. Again, that can be something to do a little ways down the road.

We can also provide a page on the site which offers lists of the currently most popular tags, maybe even the most popular new tags. If it’s clear to someone that they are about to browse “pepsigate” posts, then if they want to write a followup, they are likely to remember that that’s the tag they are responding to, and tag their post appropriately.

Won’t this list of tags become so long that any tool which auto-suggests tags to users will become too slow to use?

This problem can be at least partly alleviated by letting users specify that they are only interested in tag suggestions from particular categories. Once parent/child relationships are in place in the tag database, tag suggestions can be filtered that way. We can also learn from other tools that offer auto-complete over large spaces to see how they solve this problem.

Have folksonomies been successfully used in the past? What are good examples?

Obviously, Flickr is the best example of a site which has completely user-generated tagging. Their mission is somewhat different from ours, though! Do you have examples of folksonomies that work or that have failed?

This post is intended to start discussion, so please, weigh in! What do you think about this approach to handling the huge number and variety of tags in use on science blogs? Is it clear, and do you have questions?

Building a better network: Identifying trends/posts of interest

When you build a network of blogging networks, the problem quickly escalates from “how do I collect as much data as possible?” to “how do I manage all this data?”

Take a look at the Science Blogging Aggregated home page. There’s lots of great stuff there — too much for the typical reader to handle. Even if you visit several times a day, the information rushes by too quickly to discern any trends, and it’s hard to know which posts are really well thought out and which are just one-off posts that hardly merit your attention at all.

We talked yesterday about one way of sorting through the data — tags. However, this method alone probably won’t satisfy all users. A person might be interested in all posts tagged “psychology,” but they might just want to see the highlights of what’s going on in other fields, and tagging won’t help them identify the most interesting, thoughtful posts.

We see at least four possible ways of sifting through the posts to find the most interesting ones.

1. Crowd-sourced ranking. Users rate or recommend posts they like, so others can sort by rating or number of recommendations to find the posts they want to see. An advantage is that there is no central authority telling readers what to like. A disadvantage is that blogs that are already very popular are perhaps most likely to be recommended, so this system might not help users identify up-and-coming blogs that are very high quality.

2. Self-promotion. Bloggers could promote a small number of their posts, indicating these are their best work (one per week? one per month?). This overcomes the “up-and-comer” problem, but a blogger whose work is mediocre could exploit the system by promoting posts that aren’t very interesting or useful to others.

3. Active curation. Editors could be chosen for each field (physics, biology, etc.) and actively promote one or two posts each day. That way readers would know that an expert has read all the posts on a topic and selected the most interesting or relevant. Advantages are that editors may be able to identify trends that more automated systems don’t catch, and that editors may be less swayed by the most popular blogs. Disadvantages include possible bias of editors, and variable editor quality. It would also require coming up with a system for selecting editors. Would a central person be in charge of that, or would we need to create some sort of a system for nominating/voting for editors?

4. Social networking. We could create a truly social network where users are only shown the “likes” of their friends. However, this requires a significant programming effort, and people are reluctant to join new social networks when they already participate actively in one or more networks. I think we might be better off using the social features of other networks, rather than building our own. If we could make it really easy for people to post their “likes” to Twitter and Facebook, then we could leverage those networks to perform the social function.

There is, of course, no reason that we shouldn’t do all of these things over the long run. But we have limited resources. Which of these approaches is most useful? Are there any other approaches that would work better? Do you have any specific suggestions for how to implement any of these ideas? Let us know in the comments.