Post Reply 
[useless thread] gzip compression
Author Message
Assassinator
...

Posts: 6,646.6190
Threads: 176
Joined: 24th Apr 2007
Reputation: 8.53695
E-Pigs: 140.8363
Offline
Post: #11
RE: [useless thread] gzip compression
What I find really weird is that...

For ISOs of VNs, WinRAR seems to sometimes significantly outperform 7z....
(note: generally, for normal stuff, 7z is better like 95+% of the time).


Code:
Symphonic Rain (KOGADO)
          Info - http://tlwiki.tsukuru.info/index.php?title=File_Format:_KOGADO_.kgo)
SRTE.7z          2096189516
SRTE.rar          1916897291
SRTE.ISO          3262398464


Tsukihime and Kanon show even more significant gains.

Code:
1
2
3
4
5
6
7
8
Tsukihime (Nscripter, .sar/.nsa)
Tsukihime.7z          289793123
Tsukihime.rar          244258895
Tsukihime.ccd          3679
Tsukihime.img          354319392
Tsukihime.sub          14462016

EDIT: Tsukihime.uha          232598227

Code:
Kanon (RealLive)
Kanon.7z          554858479
Kanon.rar          472603721
KANON_SE.ISO          750782464
KANON_SE.MDS          4321



While for most others stuff, 7z is usually slightly ahead.

Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Sharin no Kuni, Himawari no Shoujo
Syarin.7z          1077146895
Syarin.rar          1084181307
syarin.ISO          1424863232
syarin.MDS          4319

Cross Channel.7z          816060428
Cross Channel.rar          817821073
CC_DATA.bin          463587328
CC_DATA.cue          73
CC_SYSTEM.bin          515096576
CC_SYSTEM.cue          75

Sekien no Inganock.7z          1436668203
Sekien no Inganock.rar          1485071132
赫炎のインガノック.iso          1721294848
赫炎のインガノック.mds          4314

Swan Song.7z          1129835031
Swan Song.rar          1138918285
SWAN_DISC1.cdi          789239942
SWAN_DISC2.cdi          428196310

Saya no Uta.7z          410610053
Saya no Uta.rar          411282142
Saya no Uta.ccd          784
Saya no Uta.cue          75
Saya no Uta.img          423898608
Saya no Uta.sub          17301984




No idea, I'm not an expert on compression.  But I would guess probably something along the lines of their data formats being heavily biased towards RAR's method of compression (and/or heavily biased against 7z's method).  Upping the dictionary and word sizes to 256 (from 64) only added 1MB of additional compression on Tsukihime, so it's not that either.  Silly eroges.


And... WinRAR on "Best" is like almost (~80%?) as slow as 7z (LZMA2) on "ultra" for me, because RAR can't use all 4 cores.


EDIT: Seems like the UHA compress finally finished.  Raped both 7z and rar for Tsukihime, not going to bother trying it on the others, takes too long.  UHA takes like 2x 7zip's time to compress.  Not even going to bother trying PAQ, that'll take like 2 days.

Tsukihime.uha          232598227

(This post was last modified: 31/08/2010 06:46 AM by Assassinator.)
31/08/2010 06:32 AM
Find all posts by this user Quote this message in a reply
ZiNgA BuRgA
Smart Alternative

Posts: 17,023.4213
Threads: 1,174
Joined: 19th Jan 2007
Reputation: -1.71391
E-Pigs: 446.0333
Offline
Post: #12
RE: [useless thread] gzip compression
From Wiki:
Quote:7z's LZMA algorithm reaches a higher compression ratio than RAR, except for "multimedia" files like .wav and .bmp files where RAR uses specialized routines that outperform LZMA
I guess those VNs have a lot of uncompressed media.

Quote:Version 3 of RAR is based on Lempel-Ziv and prediction by partial matching (PPM) compression, specifically the PPMd implementation of PPMII by Dmitry Shkarin.
Ah, so that explains why RAR outperforms LZMA on text, though 7z's PPMd still seems to outperform RAR.

I believe LZMA just uses a dictionary (modified version of LZ77) » entropy (adaptive arithmetic coding) coding chain.  IDK what RAR uses, but apparently it's based off deflate, which is also a dictionary (LZ77) » entropy (huffman) coding chain.
Huffman is old and only really was used in older formats because arithmetic coding was patented (expired in 1999), so I would expect RAR to use arithmetic coding too.
Funny that I heard LZMA2 was supposed to beat RAR consistently.  I guess that's not the case.
(This post was last modified: 31/08/2010 06:25 PM by ZiNgA BuRgA.)
31/08/2010 06:24 PM
Visit this user's website Find all posts by this user Quote this message in a reply
Assassinator
...

Posts: 6,646.6190
Threads: 176
Joined: 24th Apr 2007
Reputation: 8.53695
E-Pigs: 140.8363
Offline
Post: #13
RE: [useless thread] gzip compression
(31/08/2010 06:24 PM)ZiNgA BuRgA Wrote:  From Wiki:
Quote:7z's LZMA algorithm reaches a higher compression ratio than RAR, except for "multimedia" files like .wav and .bmp files where RAR uses specialized routines that outperform LZMA
I guess those VNs have a lot of uncompressed media.

Yes. A VN is comprised of images, BGM, voice, text, videos and the engine + related files, (well, the traditional VNs at least).  The engine and text takes very little space and the BGM/voice (assuming already compressed) and videos gain pretty much no benefits from compression, so I assume it's mainly the images that are being compressed.

Most of the time, said Images are probably either completely uncompressed, or (probably more likely) compressed in a lossless format (eg. PNG).  The images probably take way more space than the BGM, especially in newer VNs with thousands of images (though the voice probably takes even more space).  And a special property is that they can be grouped into small image sequences with small amounts of temporal change between each other (eg. the background and everything else the same, but different facial expressions).  This is probably what allows the images some room for compression, even though already compressed images (eg. PNGs) usually don't really compress at all.

REVIEW EDIT: Pretty sure usually the voice takes the most space.
(This post was last modified: 31/08/2011 09:05 PM by Assassinator.)
31/08/2010 07:41 PM
Find all posts by this user Quote this message in a reply
ZiNgA BuRgA
Smart Alternative

Posts: 17,023.4213
Threads: 1,174
Joined: 19th Jan 2007
Reputation: -1.71391
E-Pigs: 446.0333
Offline
Post: #14
RE: [useless thread] gzip compression
(31/08/2010 07:41 PM)Assassinator Wrote:  And a special property is that they can be grouped into small image sequences with small amounts of temporal change between each other (eg. the background and everything else the same, but different facial expressions).  This is probably what allows the images some room for compression, even though already compressed images (eg. PNGs) usually don't really compress at all.
I think I discussed this with you, but it's probably unlikely, though possible.  Unlikely because entropy coding can be quite different across images, but possible depending on how blocks/chunks are allocated (and if they're identical, a dictionary coder can deduplicate them).
31/08/2010 10:22 PM
Visit this user's website Find all posts by this user Quote this message in a reply
Assassinator
...

Posts: 6,646.6190
Threads: 176
Joined: 24th Apr 2007
Reputation: 8.53695
E-Pigs: 140.8363
Offline
Post: #15
RE: [useless thread] gzip compression
(31/08/2010 10:22 PM)ZiNgA BuRgA Wrote:  
(31/08/2010 07:41 PM)Assassinator Wrote:  And a special property is that they can be grouped into small image sequences with small amounts of temporal change between each other (eg. the background and everything else the same, but different facial expressions).  This is probably what allows the images some room for compression, even though already compressed images (eg. PNGs) usually don't really compress at all.
I think I discussed this with you, but it's probably unlikely, though possible.  Unlikely because entropy coding can be quite different across images, but possible depending on how blocks/chunks are allocated (and if they're identical, a dictionary coder can deduplicate them).

I sense a bit of misunderstanding resulting from sentence structuring, because what I said and what you're saying now is sort of very similar.

What I'm originally saying is, even though already compressed PNGs should not be able to compress any more, that similarity between pictures mentioned above means that the dictionary compressor may be able to take advantage of that which the image compressor cannot (temporal similarities), thus resulting in additional gains in compression.

So pretty much, I'm saying that, the fact that VNs are being compressed at all means what you're saying is probably happening, at least to some degree.  Otherwise, neither images, music nor videos are inherently compressible, you shouldn't be able to get any gains in compression.  EDIT: Wait, can WAVs (PCM) compress?  I'm under the impression of "no, not at all", but never really tried.



I know BMPs can, based both on experience and theory (bitmap is a map of bits).  I'm not too sure about PNGs, but JPGs I'm pretty sure doesn't work.  Anyway, it'll be easy to test this.  When I go home, I'll batch convert some image sets to different formats, and 7z them.
(This post was last modified: 31/08/2010 10:50 PM by Assassinator.)
31/08/2010 10:37 PM
Find all posts by this user Quote this message in a reply
ZiNgA BuRgA
Smart Alternative

Posts: 17,023.4213
Threads: 1,174
Joined: 19th Jan 2007
Reputation: -1.71391
E-Pigs: 446.0333
Offline
Post: #16
RE: [useless thread] gzip compression
There is a lot of non-image data which is probably being compressed.
But I also suspect:
- mostly uncompressed media
- even if media is compressed, most games group stuff into large files which hold this compressed media, which usually have a lot of redundancy in them (long bits of null padding, structure is almost never compressed)

I can't say that there's a lot of compression across images, if they're already compressed, because entropy coders can affect the output a fair bit, even with a small change in input.  But it's still possible nonetheless.

And PNG is based off deflate, which you can usually compress a tiny bit more using better compression algorithms.  JPEG is probably in the same boat here.
(This post was last modified: 31/08/2010 10:49 PM by ZiNgA BuRgA.)
31/08/2010 10:49 PM
Visit this user's website Find all posts by this user Quote this message in a reply
Assassinator
...

Posts: 6,646.6190
Threads: 176
Joined: 24th Apr 2007
Reputation: 8.53695
E-Pigs: 140.8363
Offline
Post: #17
RE: [useless thread] gzip compression
(31/08/2010 10:49 PM)ZiNgA BuRgA Wrote:  - mostly uncompressed media

Can't afford to have uncompressed videos.  As for audio, don't know.  From an edit which you probably missed - Wait, can WAVs (PCM) compress?  I'm under the impression of "no, not at all", but never really tried.  EDIT: Actually, now that I think of it, since it's BGM, probably yes.

Videos almost always in MPEG1.  Don't know why.  Probably because a clean install of any Windows within the last 10yrs can decode that stuff no problems, and even the shittiest computers can do it.

Here's an OP video that I have on me on this portable HDD, they're pretty much all like that.

Code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
General
Complete name                    : E:\Temporary Stash\VN OPs\sharin_movie.mpg
Format                           : MPEG-PS
File size                        : 190 MiB
Duration                         : 2mn 21s
Overall bit rate                 : 11.2 Mbps
Writing library                  : encoded by TMPGEnc 3.0 XPress Version. 3.0.1.13

Video
ID                               : 224 (0xE0)
Format                           : MPEG Video
Format version                   : Version 1
Format settings, BVOP            : Yes
Format settings, Matrix          : Default
Duration                         : 2mn 21s
Bit rate mode                    : Variable
Bit rate                         : 10.5 Mbps
Width                            : 800 pixels
Height                           : 600 pixels
Display aspect ratio             : 4:3
Frame rate                       : 30.000 fps
Resolution                       : 8 bits
Scan type                        : Progressive
Bits/(Pixel*Frame)               : 0.731
Stream size                      : 178 MiB (94%)
Writing library                  : TMPGEnc 3.0 XPress Version. 3.0.1.13

Audio
ID                               : 192 (0xC0)
Format                           : MPEG Audio
Format version                   : Version 1
Format profile                   : Layer 2
Duration                         : 2mn 21s
Bit rate mode                    : Constant
Bit rate                         : 256 Kbps
Channel(s)                       : 2 channels
Sampling rate                    : 44.1 KHz
Stream size                      : 4.33 MiB (2%)



(31/08/2010 10:49 PM)ZiNgA BuRgA Wrote:  - even if media is compressed, most games group stuff into large files which hold this compressed media, which usually have a lot of redundancy in them (long bits of null padding, structure is almost never compressed)

Even so, I can't really image redundancy making up even close to the 20% or so of compression that usually occur.  Must be either the images or the WAVs (since the MPEG1 is not going to compress at all, neither is the voice).



Again, an easy way to find out would be just to go home, extract an ISO, and compress the components separately.
(This post was last modified: 01/09/2010 02:22 AM by Assassinator.)
31/08/2010 11:06 PM
Find all posts by this user Quote this message in a reply
Assassinator
...

Posts: 6,646.6190
Threads: 176
Joined: 24th Apr 2007
Reputation: 8.53695
E-Pigs: 140.8363
Offline
Post: #18
RE: [useless thread] gzip compression
(31/08/2010 11:06 PM)Assassinator Wrote:  Again, an easy way to find out would be just to go home, extract an ISO, and compress the components separately.

Ok, tests for both are in.


VN data type compression test

Code:
1
2
3
4
5
6
7
8
9
10
11
Sharin no Kuni, Himawari no Shoujo
BGM     131 -> 128
data     53 -> 38
image     383 -> 174
voice     595 -> 557
video     189 -> 126  (WTF???)

Sekien no Inganock
images     614 -> 471
audio     416 -> 340
video     605 -> 553


The images compressed the most, as expected.  Now that I think about it, they're probably in BMP format.

What's surprising is that the goddamn video compressed.  I mean isn't pretty much any video codec supposed to do some form of entropy encoding at the end?  Then you shouldn't really be able to compress it more at all.   MPEG1, I are dissapoint!!!


Image format packing test (Yoake Mae Yori Ruriiro Na -Moonlight Cradle-)

Original is lossless BMP

Code:
BMP     405 -> 65
PNG     186 -> 157
JPG@95%       51 -> 41


As expected, BMP is the best for that.  PNG and JPG can't really compress much (entropy encoding fucking with it), but for JPG, it doesn't really matter since it's smaller than the compressed BMP set anyway.

PNG just sucks (never liked PNG much, this doesn't help).  It's also fucking slow to encode to (takes like 5x longer than the other 2 types, and that's not even using the slowest method).  Not to mention 55% gain over BMP is really quite dissapointing.

01/09/2010 03:00 AM
Find all posts by this user Quote this message in a reply
Assassinator
...

Posts: 6,646.6190
Threads: 176
Joined: 24th Apr 2007
Reputation: 8.53695
E-Pigs: 140.8363
Offline
Post: #19
RE: [useless thread] gzip compression
(01/09/2010 03:00 AM)Assassinator Wrote:  PNG just sucks (never liked PNG much, this doesn't help).  It's also fucking slow to encode to (takes like 5x longer than the other 2 types, and that's not even using the slowest method).

Not to mention 55% gain over BMP is really quite dissapointing.

Ok, just for amusement, here's the performance of x264 lossless, and a few other lossless formats.

Do note however that "lossless" here isn't exactly 100% lossless since the images had to be converted from RBG to YV12 colorspace (a lot of the video codecs don't accept RBG).  And decoding (and encoding) methods also vary, so this comparison is really broken, and the results should be taken with a grain tablespoon of salt.


Original BMP 405MB
» PNG     186MB

Speed prioritizing lossless video codecs
» Lagarith     84MB (~60fps)
» FFVHuff     96MB (~70fps)

Compression prioritizing lossless video codecs
» FFV1      60MB (~16fps)
» 264 lossless     25MB (~40fps, multithreading probably helped x264 a lot here)


So yeah... x264 is beast as fuck.



Here's the quick hackjob batch script I wrote, which generates an avisynth script to bind all your bitmap images into a single video with that many frames to feed into the video encoder, if anyone wants to try this stuff themselves (I hardly doubt anyone would, but anyway...)

Batch Script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
echo #lets generate some shitty avisynth script, woot woot woot!!!> IMAGESOURCE.AVS
echo.

SETLOCAL EnableDelayedExpansion
set count=0

for %%r in (*.bmp) do (
    set /A count=!count!+1
    echo picture!count!=imagesource^("%%r"^).trim^(9,9^).converttoyv12^(^)>> IMAGESOURCE.AVS
)

set /A count2=!count!-1
for /L %%i in (1,1,!count2!) do (
    echo picture%%i+\>> IMAGESOURCE.AVS
)
echo picture!count!>> IMAGESOURCE.AVS

pause


FFVH encoded with    mencoder.exe "input" -o "output" -of avi -ovc lavc -nosound -lavcopts vcodec=ffv1:vstrict=-2:coder=1:context=1
FFV1 encoded with    mencoder.exe "input" -o "output" -of avi -ovc lavc -nosound -lavcopts vcodec=ffvhuff:vstrict=-2:pred=6:context=1
x264 encoded with    x264.exe --qp 0 --output "output" "input"
Lagarith encoed with VirtualDubMod using defaul settings.

(This post was last modified: 31/08/2011 09:11 PM by Assassinator.)
01/09/2010 04:44 AM
Find all posts by this user Quote this message in a reply
ZiNgA BuRgA
Smart Alternative

Posts: 17,023.4213
Threads: 1,174
Joined: 19th Jan 2007
Reputation: -1.71391
E-Pigs: 446.0333
Offline
Post: #20
RE: [useless thread] gzip compression
(31/08/2010 11:06 PM)Assassinator Wrote:  
(31/08/2010 10:49 PM)ZiNgA BuRgA Wrote:  - even if media is compressed, most games group stuff into large files which hold this compressed media, which usually have a lot of redundancy in them (long bits of null padding, structure is almost never compressed)

Even so, I can't really image redundancy making up even close to the 20% or so of compression that usually occur.  Must be either the images or the WAVs (since the MPEG1 is not going to compress at all, neither is the voice).
Depends on the file structure and how much meta-info there is (plus hashtables for fast file access etc).  The PSP's RCO format, for example, can have lots of smaller files, and has a fair bit of meta/structural info, along with hashtables to speed up finding files, so compressing the header with deflate can reduce the file size a fair bit.
There's also the ISO file format:
- there are a fair bit of redundancies in a filesystem
- internal fragmentation if there are a lot of files
IMG+CCD / BIN+CUE include RAW data, which, unless the game uses it, is probably just ECC data, which I would assume has fairly high entropy, so won't compress.

Also remember that long sequences of null bytes get compressed to practically nothing after dictionary coding.



(01/09/2010 03:00 AM)Assassinator Wrote:  The images compressed the most, as expected.  Now that I think about it, they're probably in BMP format.
They almost always are, from what I've experienced.

(01/09/2010 03:00 AM)Assassinator Wrote:  What's surprising is that the goddamn video compressed.  I mean isn't pretty much any video codec supposed to do some form of entropy encoding at the end?  Then you shouldn't really be able to compress it more at all.   MPEG1, I are dissapoint!!!
The video probably won't compress well, however the file structure (container) probably has a lot of redundancy.  Open one in a hex editor and you'll probably find a fair amount of null (0x00) bytes.  Not exactly sure what the padding is for, perhaps there's something to do with attaining CBR.

(01/09/2010 03:00 AM)Assassinator Wrote:  PNG just sucks (never liked PNG much, this doesn't help).  It's also fucking slow to encode to (takes like 5x longer than the other 2 types, and that's not even using the slowest method).  Not to mention 55% gain over BMP is really quite dissapointing.
It's lossless, and there's always a size premium for lossless encoding, whether it's video (x264 lossless vs crf20), audio (FLAC vs AAC) or image.  And for the images you're testing, JPEG can probably throw off a lot of redundancy.
Also you don't mention bit-depth.  I believe PNG has the ability to use 16-bits per channel, along with a full alpha channel, none of which JPEG can achieve.
(This post was last modified: 01/09/2010 03:32 PM by ZiNgA BuRgA.)
01/09/2010 03:31 PM
Visit this user's website Find all posts by this user Quote this message in a reply
Post Reply 


Forum Jump:


User(s) browsing this thread: 1 Guest(s)

 Quick Theme: