Super Stock

updated by Ruhunaa

Screen Printing

Screen printing

1. A printing technique that uses a woven mesh to support an ink blocking stencil. The attached stencil forms open areas of mesh that transfer ink as a sharp-edged image onto a substrate. A roller or squeegee is moved across the screen stencil forcing or pumping ink past the threads of the woven mesh in the open areas. 2. A stencil method of print making in which a design is imposed on a screen of silk or other fine mesh, with blank areas coated with an impermeable substance, and ink is forced through the mesh onto the printing surface. Also known as Silk Screening or Serigraphy.

Taxonomy

There is considerable taxonomic, etymological, and semantic discussion about the process, and techniques known as Screen Printing, Screen-printing, Screenprinting, silkscreen, and serigraph. Much of the current confusion is based on the popular traditional reference to the process of screen printing as silkscreen printing. Almost no professional businesses, artists, or trade organizations use references to silk because of the general abandonment of silk as a viable mesh material after the 1960s.

Traditionally silk was used for screen-printing, that’s where the name silk screening comes from. Now a polyester is used to do the printing.

Encyclopedia references, encyclopedias and trade publications also use an array of spellings for this process with the two most often encountered english spellings as, screenprinting spelled as a single undivided word, and the more popular two word title of screen printing without hyphenation.[1][2][3]

February 6, 2009 Posted by | Uncategorized | Leave a comment

How to digitize a text

Part 1: Data entry

Typing

Although most texts are digitized using scanning and OCR, you can hand-type a text if you prefer. This is generally slower, but you might prefer a low-tech approach or might not have access to scanning and OCR.

The glossary to Bright’s An Anglo-Saxon Reader was entered by typing. Back in 1999, I organized volunteers thru the ANSAX-L mailing list. I xeroxed the glossary and mailed 6 xeroxed pages to each volunteer thru the U.S. Postal Service. I asked the volunteers to email the typed text back to me. I joined the pages into one document and regularized the character encoding.

At this writing (2004), I would only consider typing in a case where OCR is impractical, as in the case of handwritten notes. It’s true that OCR usually gives degraded results on old yellowed books typeset in archaic typefaces, especially when there are lots of special non-English characters; but I still find it to be much less effort to correct the errors than to type the whole thing. But, if you prefer to type, go ahead.

Scanning

An ordinary inexpensive flatbed scanner works just fine when scanning for OCR, but it is slow (you might be able to scan 1 or 2 pages per minute). I use a networked office copier which can scan as quickly as it can make copies (if I put sheets in the document feeder, it can scan around 35 pages a minute; but of course it’s slower if I have to turn the pages of a book).

I usually scan both the right and left pages together as one image, except in rare cases where it’s such an unusually large book which won’t fit on the glass. I use Photoshop to split the picture into two after I scan, as described below.

I recommend at least 300 dpi (dots per inch) as the scanning resolution; the higher the resolution, the better the results from an OCR program. It is really worth it to spend some time experimenting with the different scanning settings; I’ve found that despeckling in particular gives greatly improved OCR results. TIFF is the most commonly used file format for scanning and OCR.

Renaming files

After you have done the scanning, I strongly recommend renaming your image files so that the filenames match the page numbers. This takes some time, but it will save you many headaches later, and is also a good way to find out which pages you have missed before you return the library book which took two weeks to get thru Interlibrary Loan.

Another page in this About section describes a scheme for naming files so that the pages stay in the right order, which is very helpful.

Post-processing the images

Altho it is not strictly necessary, I recommend doing some post-processing on your images. One thing you can do is to divide the left and right pages into separate files. I usually make two little batch-mode macros in Photoshop to do this, one macro for the left page and one for the right. I leave an ample margin when setting the crop area, since each page is always positioned a little differently when you scan.

When I split the left and right pages into separate files using Photoshop, I put the right-side (odd-numbered) pages in a separate directory. Then I run a little script I wrote which renames each file by adding one to the number in the filename, so that b0004.tiff becomes b0005.tiff.

Then you can hand-process each image to un-tilt the text and to crop away the dark margins, which takes me a few hours for an average book (I am sure that a fully automated image processing solution must be possible; if you know of one, please let me know). Getting rid of the margins makes the files smaller, which is a consideration if you’re going to be publicly posting the page images. The way I do this myself is to use the drag tool to drag the whole image so that the upper left corner of the text is lined up with the upper left corner of the image border. Then I run a batch macro to crop away the right and bottom portions of each image.

Un-tilting the text isn’t quite as important since most OCR programs do this automatically, but there have been cases where I have found it useful For example, I wrote a series of scripts which printed the image each entry in the Torp dictionary right above the online version of the text, so that I could compare the two and mark corrections with a pen while riding the train. The necessary image processing was much easier to do with un-tilted images.

OCR

Getting good results from an OCR program is as much art as science. It is really worth it to experiment with the settings and spend some time seeing what improves the results, because even a small gain will save a lot of time on the corrections. Some OCR programs will let you train the program on a sample page, so that it can learn the typefaces being used. Some will let you say what characters you want to allow, a big consideration since many texts on pre-modern languages contain unusual characters.

No OCR program lets me save the files in the exact format I want. What I usually do is output each page as a separate HTML file. Then I write a script to which strips off the HTML preamble and coda, which I don’t need. This leaves only the text itself, containing the HTML entities and markup tags for boldface and italics. Then I join all the files together into one big file, using a little script to add a page break marker at the top of each page:

<PAGE NUM=”b0005″/>

(Page marker for the beginning of page 5 of the main text; I get the page number from the filename which I already made match the page number as described above.)

I recommend that you don’t throw away your image files when you’re done with scanning. It’s good to post them on the web so that other people can immediately look to see whether an apparent error is in the original or not (you’ll always miss some errors when you correct). Also, you may need to repeat the OCR at some point. In the case of my own projects, I’ve sometimes worked out elaborate systems for printing the original scanned image on the same page as the online text for easy comparison.

Part 2: Correction

Even after I carefully train the OCR on sample pages, the output text is often so degraded that some people would probably rather give up and enter the data by hand instead. However, there is usually some regularity to the errors; the same errors tend to happen over and over. We can take advantage of this partial regularity to greatly reduce the number of hand-corrections which must be made.

Global corrections

Before I start hand corrections, I do some global cleanup of the file.

A typical sort of error is that þ (Germanic thorn) often comes out as P, p, or b, but you can’t simply change P to þ since P is sometimes correct (sometimes the text really does contain a capital P). What you might do is to change P to þ except when the P is at the beginning of a word. This will usually give a correct substitution. It won’t work for b or p, since these letters can occur in the middle of a word; but it’s still worth doing for P.

Here are a few of the many global substitutions I did on Bosworth/Toller:

  • Remove the <SUB> and <SUP> tags, since these are never correct (there were a few dozen instances of each)
  • If a comma comes immediately before a letter without a space in between, insert a space.
  • Replace <I>-</I> (an italicised hyphen) with – (a plain hyphen, since a hyphen alone should never be italic).
  • Replace ].</I> with </I>]. (A square bracket should not be italicized even if the immediately preceding text is italic.)
  • etc.

There is no easy way to determine which global changes will be beneficial. You simply have to study your file and notice what is repeatedly going wrong, and figure out what changes you can make without introducing a lot of new errors. This is something that you get better at as you do it more.

In practice, I usually do some global corrections, then hand correct a few pages. Then I go back and do another round of global corrections because I notice more possible strategies while doing the hand corrections.

Writing scripts

If you know how to program, you can often come up with various schemes to automatically detect and correct errors.

For example, in German texts, the characters ä, ö, ü are all often misrecognized as a 6. A program can readily detect that a word such as sch6n must contain an error, because 6 cannot occur in the middle of a word which begins with letters. But which of the possible corrections is the right one: schän, schön, or schün? A human who speaks German readily knows that it must be schön, but how can a computer tell this?

What your program can do is to try every substitution and see whether there is more than one possibility in a German dictionary. If only schön actually exists, we can reasonably assume that this is the correction, and automatically make the substitution.

I produced the list of possible substitutions with a recursive algorithm which reads the characters left-to-right; at each recursion, the routine examines the present character and calls itself recursively for each possible substition, so that we branch thru all of the possibilities and assemble a list (this strategy works, but it is rather slow; if you know of a faster algorithm, please let me know).

This technique will occasionally introduce new errors, but on the whole, it is enormously faster than making the same corrections by hand. Over 6000 corrections were made to the Torp dictionary at a single pass.

Hand corrections

Global and automated corrections can get you a fair distance toward a corrected text. In the end, however, there is no avoiding checking and correcting the text by hand. This is by far the most time-consuming portion of the entire project, but it is unavoidable.

One way to correct your text is to set the original book beside your computer, and compare the book with your online text by looking back and forth between the book and your screen. You can do this, but I find it extremely tedious and hard on the eyes.

What I prefer to do is to print out the online text and original page image, set them side by side, and mark my corrections with a red pen. This takes advantage of the fact that humans catch a substantially higher percentage of errors on paper than on the screen. It also has the advantage that you don’t need to be near a computer to mark the errors (for example, during a long road trip from Philadelphia to Denver, I marked up the errors in the online version of the readings in Gordon/Taylor’s textbook of Old Icelandic.

I usually need to represent some characters with entities such as &e-long-acute; or &hw;. This is obviously hard to read, so it’s better if I can print out the text in a more human-readable form to compare against the original page image. My way of doing this is to write a Perl script which typesets my working file by producing a TEX file (TEX is a typesetting system of unparalleled quality and flexibility; it has been around for decades and is available for free).

If you’re not a programmer, you can often produce a more readable form of your document by doing a bunch of appropriate search-and-replace operations on a copy of your file before printing it.

A convenience I worked out for the Fick/Falk/Torp corrections was to work out a macro in the Emacs text editor. While editing the dictionary file, I could hit a key to pop up a window displaying the scanned image of the dictionary entry corresponding to the current location of the cursor in the Emacs window. This was very handy.

Part 3: Markup and validation

I mark up my base document in XML (a markup scheme). Then I derive other kinds of documents from the XML base document:

Whenever additional corrections need to be made, you make them in the base document, and then automatically regenerate the derived formats.

This method has so many advantages to it that it has come to be generally accepted as best practice among professionals who manage and archive text and present it in multiple presentation formats.

What XML is, and why it is a really good idea

In HTML, you can mark up text for attributes such as boldface or italics, as in this entry from Bosworth/Toller:

a-berstan; p. -bearst, pl. -burston; pp. -borsten; [a, berstan] To burst, break, to be broken; perfringi. v. for-berstan.

Here is how this entry might be marked up with HTML tags:

<p>
<b>a-berstan;</b> <i>p.</i> -bearst, <i>pl.</i> -burston; <i>pp.</i> -borsten; [a, berstan] <i>To burst, break, to be broken;</i> perfringi. v. for-berstan.
</p>

Notice that italics are used to represent two different kinds of text: part of speech tags such as pl. and pp., and the modern English gloss (To burst, break, to be broken). If you’re only concerned with making a web page look like the original paper page, then the tags <i> and <b> are enough. However, there are very good reasons for not using the same tag to denote these two different kinds of information. For example:

  • If you are building a search system, you might want to allow the user to limit the search to just the glosses or just the part-of-speech fields. If you used separate tags for the two, your search system can tell which is which.
  • You might change your mind later and decide to visually represent only one of the two fields with italics. If you have used the <i> tag for both, there is no straightforward way to tell which is which.
  • If you mark up the part-of-speech tags separately, you can validate them (i.e., check them for errors) by comparing those fields against a list of known tags.
  • If you mark up the modern English glosses separately, you can validate them by extracting them and running them thru a standard spell checker.

For these reasons and others, it is better if you use semantic tags (tags which tell what kind the information is) rather than presentation tags (tags which describe how to visually render the text).

XML is a markup system similar to HTML. A major difference is that in XML, you get to make up your own tags for whatever purpose you need. Following is a possible representation of the entry above in XML:

<entry>
<headword>a-berstan;</headword>
<pos>p.</pos> <inflected-form>-bearst,</inflected-form>
<pos>pl.</pos> <inflected-form>-burston;</inflected-form>
<pos>pp.</pos> <inflected-form>-borsten;</inflected-form>
<etym>[a, berstan]<etym>
<gloss lang=”en”>To burst, break, to be broken;</gloss>
<gloss lang=”latin”>perfringi.</gloss>
<xref>v. for-berstan.</xref>
</entry>

Notice that the headword is marked up with <headword>, the part-of-speech tags with<pos>, the etymology with <etym>, and so on.

This isn’t necessarily the best way to represent this particular entry in XML, but the important point is that the data is marked up to show what kind it is, not how to display it. The display details are properly the concern of the programs which convert the abstract XML document into concrete presentation forms such as PDF, PostScript, HTML, or RTF.

So if you want to mark up a plain-text document, I think that the best thing would be for you to use semantic tags. You can study some of the other examples on this site to see what to do. I also recommend reading up on the TEI guidelines, which define a standardized set of tags to use in projects of this kind.

However, if you don’t completely understand what I mean by semantic tags or if you find it awkward to use them, it’s certainly better than nothing if you mark up fields with attributes such as italic, bold, etc. If you’ve been consistent, a good programmer can often partially or fully recover the semantics of the fields based on context.

Scripts for markup

If you wanted, you could do all of your XML markup by hand, but this would be extremely tedious. I usually write a Perl script to do most of the markup automatically, and then make corrections to the markup by hand. I’m not going for elegance in parser design here; I just exploit whatever convenient properties the glossary has to get the job done.

For example, here’s a typical line from Wright’s Gothic glossary:

alls, aj. all, every, much, 227, 390, 427, 430 OE. eall, OHG. al.

My parser for this glossary basically peels off one layer at a time. First, it scans each word going left-to-right, looking for keywords like OE or OHG which start the etymological section; upon finding such a keyword, the script splits off that word and everything to the right. Next, it works right-to-left, removing the number list one item at a time, until it hits a word which is not a number. Then it takes the headword off of the beginning, and then removes the part-of-speech abbreviations. Whatever is left is taken to be the definition. Then the script just spews all of these lists to the output, adding the appropriate tags. (The whole process is slightly more complicated than this, but this is the basic idea.)

This is hardly an example of elegant programming, but it’s quick to code, and it gets nearly all of the entries right. I correct the few remaining problematic lines by hand.

Validation

Even with all of your hand checking, there will always still be some errors, both in the markup and in the data themselves. You can make many kinds of errors reveal themselves by means of validation tests.

The type of validation depends on the type of data. Here are some tests I typically use:

  • I make histogram of all of the characters in the file (e.g. the letter “a” occurred 7866 times, the letter “b” occurred 2410 times, etc.). This is a good way to find certain kinds of mistakes. For example, it is very unlikely that the yen ¥ symbol belongs in an text about Old Icelandic written in 1874.
  • I make sure that my XML file is in valid format by using rxp, a Unix command line program. Among other things, this ensures that all of my entities (such as &thorn;) and tags (such as <pos> or <etym>) are ones which I’ve declared to be legal. It makes sure that all of the tags balance (every <pos> is followed by a </pos>). It makes sure that every entity is closed with a semicolon, etc.
  • I write a little program to extract the headwords and confirm that they are in alphabetical order; if they aren’t, there’s either a data entry error or an error in the original text.
  • I make a histogram of the part-of-speech abbreviations in the part-of-speech fields. Sporadic errors tend to be at the bottom of the list and are usually readily apparent.
  • I spell check the glosses in modern English or German against a dictionary of words in the respective language. (You can extract the gloss fields from your file, open the file containing the extracted fields in any word processor, and use the spell checker.)
  • If I have a corpus of the pre-modern language (such as the Project Ulfila corpus of Gothic or the Toronto corpus of Old English), I check words in those languages against the corpus as a kind of spell checking. (I just use sort -u to produce a word list, make a big hash in Perl, and check words against it). There are various complications; for example, an Old English glossary might standardize the headwords to West Saxon orthography, even in cases where the word in question is not attested in West Saxon and doesn’t appear in the corpus in the listed form. So some hand work is still necessary, but this kind of automated check is better than nothing.
  • I’ll take advantage of whatever useful properties a glossary has. For example, notice in the Wright example above that the paragraph references are in ascending numeric order (e.g. 7, 15, 109, 113, 207). It’s easy to automatically confirm that the numbers are ascending; it must be an error if the value ever descends (and I did shake out some errors this way).
  • I look for characters in unexpected positions. For example, the word fa1l has a number 1 in place of one of the lower-case L’s. This is easy to miss in the hand-checking because the two symbols are so much alike. I might do this by searching for words where a letter is immediately followed by a number, which is usually wrong.

Other file formats.

There are two important considerations to keep in mind when considering what file formats to use:

1. Your digitized data will still be useful 10 years from now. However, it’s very unlikely that very many people will still be using today’s word processor formats 10 years from now. It might be difficult to find software which can even open the files.

2. It’s often useful for data to be presented in more than one format. For example, one might want both an HTML and a PDF version of the same document. Conversions between presentation formats can be problematic; for example, it would be very difficult to convert a PDF file to an HTML file. Also, exporting from proprietary word processors is often not as straightforward as it seems.

It is because of these considerations that many professionals who have to manage data have adopted a model in which an abstract base document (often encoded in XML) can be automatically converted to any desired presentation format. XML has a simple and transparent structure which a programmer can easily work with; therefore data in XML will remain usable indefinitely.

Here are some comments on certain file formats:

PDF and PostScript: these file formats allow you near complete control over the way the data appear when the user views them. However, the data are in a frozen form, meaning that it is basically impossible to extract the data from the file and use it for some other purpose. The user might want to reprocess and reuse the data in all kinds of useful ways which you might not have envisioned. So it’s OK to use the formats as a supplementary presentation format, but I strongly recommend against making them the only way you publish your data.

Microsoft Word and other proprietary word processors: Word has the advantage of being widely used by Windows and Macintosh users, at least at present. However, there are some significant disadvantages:

  • The files are in a binary-barf format, which makes it very difficult for a programmer to manipulate the data. You can export the data to another format, but word processors rarely export the data cleanly; there are usually anomalies in the output format which the programmer has to fix by hand. (If you are using Word for your base document, this means that this cleanup will need to happen every time you make further corrections and re-export the data, which is obviously not a good solution.)
  • Special characters, and text attributes such as bold and italic, might look good when you open the file with the word processor you created it in. However, since each word processor has its own quirky ways of storing this kind of information, these characters and attributes tend to get mangled or lost when you try to do anything else with the data.
  • Most importantly, word processor formats tend to go obsolete fairly quickly (for example, I have hundreds of MacWrite II files from the early 1990s which I now have no staightforward way of opening). Microsoft changes the Word file format every few years. Think ten years down the road: your data will probably still be of valuable, but they will be useless if they are locked up in a word processor format which nobody uses any more.

Of course, a big consideration is the tools you have available to you and are familiar with. The biggest thing is digitizing and correcting the text, so go ahead an digitize even if you don’t act on all of the recommendations made here.

February 6, 2009 Posted by | Uncategorized | Leave a comment

SEO Tips

10 SEO Tips

It is not hard to be barking up the wrong tree when it comes to getting search engine traffic because there is so much out of date information being circulated.

That is why it is always best to also learn many other traffic generation methods like viral marketing for example.

Not only is there out of date or invalid SEO advice getting around, there is also information which if acted upon, can result in your pages being banned.

The SEO tips below should assist the reader in forming a basic understanding of how to create human friendly web pages which are easily understood by the most popular search engines.

Know this. There are thousands of search engines but only two of them will bring you most of the traffic. They are google and yahoo. Another search engine that brings me a little traffic is msn but I do not focus too much on tactics for that engine.

Focus your attention on the engines that will bring you the most visitors first and work your way down.

Basic SEO

1. Insert keywords within the title tag so that search engine robots will know what your page is about. The title tag is located right at the top of your document within the head tags. Inserting a keyword or key phrase will greatly improve your chances of bringing targeted traffic to your site.

Make sure that the title tag contains text which a human can relate to. The text within the title tag is what shows up in a search result. Treat it like a headline.

2. Use the same keywords as anchor text to link to the page from different pages on your site. This is especially useful if your site contains many pages. The more keywords that link to a specific page the better.

3. Make sure that the text within the title tag is also within the body of the page. It is unwise to have keywords in the title tag which are not contained within the body of the page.

Adding the exact same text for your h1 tag will tell the reader who clicks on your page from a search engine result that they have clicked on the correct link and have arrived at the page where they intended to visit. Robots like this too because now there is a relation between the title of your page and the headline.

Also, sprinkle your keywords throughout your article. The most important keywords can be bolded or colored in red. A good place to do this is once or twice in the body at the top of your article and in the sub-headings.

4. Do not use the exact same title tag on every page on your website. Search engine robots might determine that all your pages are the same if all your title tags are the same. If this happens, your pages might not get indexed.

I always use the headline of my pages as the title tag to help the robots know exactly what my page is about. A good place to insert the headline is within the h1 tag. So the headline is the same as the title tag text.

5. Do not spam the description or keyword meta tag by stuffing meaningless keywords or even spend too much time on this tag. SEO pros all agree that these tags are not as important today as they once were. I just place my headline once within the keywords and description tags.

6. Do not link to link-farms or other search engine unfriendly neighborhoods.

7. Do not use doorway pages. Doorway pages are designed for robots only, not humans. Search engines like to index human friendly pages which contain content which is relevant to the search.

8. Title tags for text links. Insert the title tag within the HTML of your text link to add weight to the link and the page where the link resides. This is like the alt tag for images.

My site contains navigation menus on the left and right of the page. The menu consists of links not images. When you hover over the link with your mouse, the title of the link appears. View the source of this page to see how to add this tag to your links.

9. Describe your images with the use of the alt tag. This will help search engines that index images to find your pages and will also help readers who use text only web browsers.

10. Submit to the search engines yourself. Do not use a submission service or submission software. Doing so could get your site penalized or even banned.

February 6, 2009 Posted by | Uncategorized | Leave a comment

Digitizing

Digitizing Tablets

Digitizing Tablets

An input device that enables you to enter drawings and sketches into a computer. A digitizing tablet consists of an electronic tablet and a cursor or pen. A cursor (also called a puck) is similar to a mouse, except that it has a window with cross hairs for pinpoint placement, and it can have as many as 16 buttons. A pen (also called a stylus) looks like a simple ballpoint pen but uses an electronic head instead of ink. The tablet contains electronics that enable it to detect movement of the cursor or pen and translate the movements into digital signals that it sends to the computer.

For digitizing tablets, each point on the tablet represents a point on the display screen in a fixed manner. This differs from mice, in which all movement is relative to the current cursor position. The static nature of digitizing tablets makes them particularly effective for tracing drawings. Most modern digitizing tablets also support a mouse emulation mode , in which the pen or cursor acts like a mouse.

Digitizing tablets are also called digitizers, graphics tablets, touch tablets, or simply tablets.

What is Digitizing

Any design can be embroidered, but first it needs to be converted into a format that is accepted and readable by the embroidery machine. That conversion process is called “digitizing.” The digitized file tells the embroidery machine where, how and which colors to use while stitching. We provide that service, focusing on “sewability.” We provide quality professional digitizing, completed quickly at the best price available… just $20 for up to 10,000 stitches.

Getting a Quote for digitizing

Did you know that most designs are under 10,000 stitches? Send us your artwork file today and let us quote you the price. If it’s over 10,000 stitches add just $1.00 per each additional 1,000 stitches! Use our convenient Quick Quote section on this page and upload your file for digitizing.

100% Accuracy digitizing

We understand the importance of having the exact design embroidered to satisfy your customers’ demands. That’s why we promise superior quality digitizing by our team of experienced professionals at the best price, guaranteed! Your satisfaction is our #1 priority!

Artwork File Formats Accepted for Digitizing

Reproducing a logo with accuracy and detail can best be done from line-art or vector files such as Illustrator(eps) or CorelDraw (cdr). Some other files also work well: PDF, TIFF, JPG etc. High resolution art works well, but may be difficult to transmit via e-mail, in that case it is best to upload directly to our web site.

If you have a poor resolution file, the digitizer must interpret and redraw the design. If vector art is not available, send what you are provided with and we’ll advise you before we begin digitizing.

We currently accept the following graphical formats:
EPS, CDR, AI, PDF, JPEG, HPGL, TIFF, DXF, AutoCad

Digitizing Hot Lines

U.S. Toll Free : 866-626-4455

February 6, 2009 Posted by | Uncategorized | Leave a comment

Hello world!

Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!

February 6, 2009 Posted by | Uncategorized | 1 Comment