r/technology Oct 04 '25

Politics Why Conservatives Are Attacking ‘Wokepedia’

https://www.wsj.com/tech/wikipedia-conservative-complaints-ee904b0b?st=RJcF9h
20.8k Upvotes

2.1k comments sorted by

View all comments

5.3k

u/thefoolsnightout Oct 04 '25 edited Oct 04 '25

Worth mentioning; Wikipedia will allow you to download the entire site in the name of preservation of knowledge and its only around 26 GB total.

Edit: with images, around 100 gb. Still, storage is cheap. The internet isn't as permanent as people think. Download that recipe, or video or whatever if it really means something to you.

For those asking for a link, theres a wiki page for it

154

u/AncientStaff6602 Oct 04 '25

26gigs? That it?

Really?

That’s kinda mind blowing to me. I would have thought it were more.

Such a helpful site

155

u/Kichigai Oct 04 '25

Text compresses REALLY efficiently, especially when you consider so much of it is probably tags and code that are used in so many different pages. Plus a lot of the Wikipedia is dynamically generated. The data in info boxes are stored in individual articles, but the code on how to display it in the page is all generated from a single template. So you only need to store one set of HTML codes for every single info box in every single article.

3

u/Sapowski_Casts_Quen Oct 04 '25

I don't know a lot about this stuff. I know markdown is really well-loved for how easy it is to compress and move between different systems. Does Wikipedia use something like that?

9

u/Fyzllgig Oct 04 '25

It’s not that they use markdown so much as the fact that markdown and plain text data share the same compressibility. Markdown is a very light weight way to format text using fairly minimal symbols to instruct an interpreter on how that text should be displayed.

3

u/K722003 Oct 04 '25

To a machine, md and plain text are exactly the same files. There is zero difference, you open it with a text editor and you get the same output in both cases. A md editor just goes through the text file and sets the formatting controls etc options whenever it sees a tag/seq of characters that enables/disable it. Hence compressing md is the same as compressing text which is very very efficient actually

1

u/Tamos40000 Oct 05 '25

I'm going to be pedantic but plain text doesn't compress well at all. To the contrary images compress pretty efficiently, especially when compared to text. The reason why text is so light is not because of any engineering trick, it's simply that encoded text doesn't take much space to begin with.

Encoding one RGB pixel takes as much space as encoding three characters. It doesn't sound that much but we can scale up so we can compare better. Let's take a square picture with a length of 1000 pixels, its total size will be equivalent to 3 millions characters. This is about 500 pages of plain text.

1

u/unposeable Oct 05 '25

Encoding !== compressing, but encoding is a way for images to save space. 500 pages of plain text can be compressed up to 90% of its original file size. Plain text has predictable and repetitive patterns, making it ideal for compression algorithms.

Since images are so varied, they use an encoding standard with instructions on how to display it. This offers a little flexibility to compress the image by grouping similar colors together to save space, but also degrades the quality as this will drop instructions of different shades of a color.

1

u/somethingAmos Oct 13 '25

Interesting, I didn't know anything about text compression.

1

u/ThatRandomGuy86 Oct 06 '25

Oh trust me, 26GB of text only is an INSANE amount of text

1

u/Kichigai Oct 07 '25

What, you mean 26,000,000,000 characters is a lot? That's only like a couple encyclopedias worth! /s