Project in Information-Theoretic Modeling : Details for problem 1
The task of problem 1 is to encode the file
(containing the class variable x0) using side information
(consisting of four columns corresponding to variables x1, x2, x3, x4) by arithmetic coding, see
http://dl.acm.org/citation.cfm?doid=214762.214771.
You may use your own implementation of arithmetic coding (which may be somewhat tricky) or download software from the net and get it running for this application.
The data is i.i.d. and has been generated by a graphical model of the following structure
with relevant local probabilities
P(x1=1) = 0.25 |
P(x2=1) = 0.55 |
P(x0=1|x1=0, x2=0) = 0.3 |
P(x0=1|x1=0, x2=1) = 0.9 |
P(x0=1|x1=1, x2=0) = 0.85 |
P(x0=1|x1=1, x2=1) = 0.45 |
P(x3=1|x0=0) = 0.35 |
P(x3=1|x0=1) = 0.6 |
P(x4=1|x0=0) = 0.4 |
P(x4=1|x0=1) = 0.2 |
Each file contains 100,000 rows (new, larger files up now).
You should deliver a .tar (or .tgz) package such that
- extracting it creates a directory, e.g. 'x'
- 'x' contains an executable named 'run' such that the command x/run can be executed
- after execution x contains the file 'ex1_class.dat' which is identical to the given
- the side-information file 'ex1_side.dat' can be assumed to lie in x's parent directory.
Your score will be a function of the size of your package in bytes, which you aim to minimize. I will not read your code and you may use any software that is available on a standard department computer. Please do remember that you will have to briefly explain what you did in class, and that you will need to turn in a final report at the end of the course. It is advisable to start this report now and keep it up to date to relieve your memory.
The deliverables for the following problems will need to generate this same file 'ex1_class.dat', plus the ones required for the future tasks. You may therefore re-use any of the current code at no additional cost. If in retrospect you feel you have overlooked something crucial, you may also return to earlier problems in order to cut down codelength for the remaining, combined packages you deliver.
The deadline for your submission is Tue Nov 6, 9am.