bethlakshmi: (Default)
[personal profile] bethlakshmi
Went to class.. and mostly good things. (A) I am definitely NOT the most behind. At least three other folks are noticeably less done than me. One of the three has a major clue-gap. (B) I am not suffering from clue-impairment (IMO), I am suffering from Making Problem Too Hard Syndrome, I now have an easier way. (C) The teacher is pushing "means to an end" as an approach. If I can make it work, I'm golden. If that means hacking the system to do things an operating system normally wouldn't be able to do - that may be OK. Especially if I document why I did the travesty that I did.


So... did you get a dirty thrill for clicking on something this geeky?

Here (I think) is what was going wrong with this project. I'm trying to manage virtual memory in coordination with Disk IO. The system has:
- 1024 virtual pages
- 64 frames
- 12 disks with 1600 sectors each

(if you don't work with operating system development daily, here's an explanation)
It's a very cute little system. This means that the user program (ie, the test code), thinks it has 1024 pages. But in the CPU, there is only space for 64 pages at any one time. The O.S. makes the program *think* these 64 holding frames are actually 1024 pages, by switching data in and out of the frames and updating a page table that says what page goes to what frame, at the moment. The theory here is that the user program will only need < 64 pages at once. It may need a different < 64 pages in a few moments, but due to locality of reference it tends to hover around a set of pages for a little while. Since the memory for all these pages is really expensive, it's much more economical to have a limited amount of memory, and then use the processor to swap pages in an out of frames.

So... all good so far? Now then, what happens when the user program requests the 65th page? Well... we have to take a previously assigned page out of a frame, and put the new, 65th frame in there. But... we can't just throw out the previous page... because the program thinks that that page is still around somewhere. So...we need to save the page. But where, you ask? Well... to a slower type of memory. In this really simple case, the disk. I have 12 disks. Disks take a little while to write/read to/from since you have to spin the disks, find the sector, etc, etc. So.. there's a delay, which makes them less cool than frames... but there what I have for other storage. So... somehow I have to get data from frame, write data to disk, reassign frame. Let's say that I take frame #1, which used to hold page #1, and now I reassign it to page #65. Well... what happens if the user program wants page #1 again? It, after all, thinks there are 1024 pages... so it hasn't done anything wrong. But... page #1 has been replaced by page #65 in memory. So... the operating system has to go get page #1 from the disk, and then it has to put page #1 back in *a* frame. Not necessarily frame #1 - it can go anywhere, so long as the page table is updated with whereever it is. So we could put it in frame #2.... but if frame #2 had page #2 in it, we need to write page #2 to disk just like before. See how this starts to get sucky?
(end operating system explanation)

This is where things were running amok. The happy lecture on this topic suggests that a good way to manage this is to save frames to disks on a periodic basis, in CPU down time. And to line up those little requests one after another, so the disks can all chug merrily away using minimal CPU time (disk i/o happens in parellel to CPU usage). So... that's what I did. I have a little "save_frame" function that looks for all frames in need of saving. When the disk I/O is done, it signals an interrupt saying "I'm done" and I wrap up the processing, marking frames as not needing to be saved and thus available for reassignment.

That was all darling... until my interrupt handler jammed up. Depending on how and when I did my massive frame saving dance, I would process between 4 and 58 disk requests, and then my interrupts would stop. This was caused by a couple problems:
- 1 - I'm too ambitious - most students just save a frame when they run out of frames to assign. My clever little "let's save all the frames periodically" approach was what a regular O.S. might do, but not what a lazy student in grad school does, apparently. Just saving 1 frame at a time will reduce the complexity enormously.
- 2 - There is a bug in the interrupt handler that someone else found - after a while, interrupts just randomly stop. This has to do with the way the prof configured interrupt handling and multi-threading in the fake processor.
- 3 - The mix of faults and interrupts was Bad, Bad Juju. These are two ways the hardware bugs the operating system for attention. Interrupts are usually messages saying "this thing you asked for is done, master", Faults are usually "uh-oh! Problem!". When disk i/o is done, the disk throws an interrupt saying "I'm done, anything else?" - and then you can take the data and do something with it, and ask the disk to do another i/o job. When a page table doesn't have a frame for the page requested, a fault is thrown. That's not a big deal - it means it's time for the o.s. to get off it's ass and go get a frame. Both are regular occurences, but for our little processor, they are NOT two great tastes that taste great together. I was getting interrupts in my fault handler, because he stores interrupts and faults together in one big queue. The project libraries were just made multi-threaded, which means that the prof needs to rethink some of the ways he does signalling.
- 4 - there is no master page table. The O.S. doesn't have a way to touch each frame directly, the only memory i/o is through virtual pages, which are only available from within a given process. There is no master table in our fake processor, nor are there memory commands that reference memory in terms of frames instead of pages. Thankfully, the test code only uses one process - so you can remain in the process and use one page table. But... it's a klugey way to do it.

Have you fallen into a coma yet?

It gets better. Beyond that, I was corrupting my frame memory. We get the frame data in a really klugey way, and I didn't really understand how it worked. So I *think* I was corrupting my frame memory somehow, which was All Bad. Knowing now how to do this The Right Way, I am in a position to fix that problem, too. Or, at least, to use a simpler algorithm and thus be able to put together some "it's broken" examples that aren't so hard to trace through.

And.. if this doesn't work, I'm making my own "frame holder". I will take stuff out of my frames, store them in a C construct, and put them back when they are needed. Screw the disk I/O. But only as a last resort. I really do want to be a good doo-bee.

I'm sure at this point, that those who read my journal to hear about elegant ways to get partially naked have fallen into a stupor. If you read all the way through this, I applaud your geekiness.

We will now return to our regularly scheduled program.



Ah well... another fun day at work. Yesterday WAS actually kind of fun. I finally got to play with this peice of software my customer developed. I wrote a paper LAST WEEK on how we might integrate this software into our system. Do you see the logical flaw here? Wouldn't it be nice if the person writing the paper had actually used the software she was writing about? No... that's just crazy talk. I wrote the paper based on what my boss told me the software did, and based on what the two developers who had come by the week before to install the software said about it. They were not sucessful in said installation - we had connectivity issues which got resolved while I was writing said paper. This really is very pointy-hair boss-like.

But... I got to play yesterday. And, true to form, I took 10 minutes and got it to throw a nullPointerException. In coder-land this is really bad juju. If this were C/C++ the program would have crashed. In Java, it is rather like hiccuping and accidentally bringing up a hippopatamus. You kind of wonder if life goes on after something like that. Glad to see that my legendary I Can Break Anything skills are still in force. I did warn the customer that I could break *anything*, and maybe they didn't want me to play with their precious thingy. But they felt sure I couldn't do anything nasty to it. I really am why we can't have nice things.

That said, it really is a pretty swell thingy. I don't love its architecture - but it does what it's supposed to do pretty darn well. The fact that it took a pretty big licking from me, and kept on ticking really does say something. I was a little disappointed that I couldn't bring it to a Will Never Be the Same Again state. I'm like the gorilla with the suitcase. If the suitcase is still suitcase-shaped and closed with stuff in it when the gorilla is done with it, it's a pretty good suitcase. It doesn't matter if it has a few dings - after all, it was a GORILLA.

Fear the giant Lakshmi.
(deleted comment)

Date: 2005-12-06 04:35 pm (UTC)
From: [identity profile] multigeek.livejournal.com
Yup.

One thing that gets tricky with pre-writing pages to disk is avoiding race conditions where the application makes changes to the page while you're writing it out.

As to interrupts and faults, at some level a fault is just a type of interrupt. I've often seen the terms "fault," "exception," and "interrupt" used interchangeably. Generally interrupts are turned off when you enter the handler, but faults aren't maskable, so getting one while in an interrupt handler is bad. Fortunately, turning off interrupts stops the scheduler from running the application, so the only faults you can encounter in an interrupt handler are bugs in the handler. (In an SMP system, it works the same way, but each CPU handles its interrupts independently.)

Date: 2005-12-06 05:05 pm (UTC)
From: [identity profile] lakshmi-amman.livejournal.com
For things in my area of expertise - I'm all over this approach. The challenge, when I'm doing something new, is knowing when I've made it too complicated, versus when I need to make a leap of faith. There have been a few "leaps of faith" with this class - where I can't see if step 1 is working until I do something about steps 2, 3 and 4. I thought this was one of those times, and the simple, obvious mechanism just didn't occur to me.
(deleted comment)

Date: 2005-12-06 05:22 pm (UTC)
From: [identity profile] lakshmi-amman.livejournal.com
In this one, I bought into the concepts presented by book and lecture, that saving the frame out of band was a significant improvement. So significant, I ought to do it that way from the start. And - not knowing easy from hard in this area - I thought "might as well do it right the first time".

Admittedly - this project is suffering from not a lot of design work, too. Given that I'm learning the topics as I'm writing the project, I don't have the bird's eye view I usually have while doing design. But then, that's why your designers are typically people who have been on a project in that branch of technology before. :) I certainly didn't design my first PKI project!!

It's funny but I document the way you describe - I add comments as though a brand, shiny, new CS grad were gonna read it, having never worked in this venue before. Because.. well... I WAS that college grad, and fixing other people's complex code was exactly what I had to do my first 6 months on the job.

Date: 2005-12-08 04:33 pm (UTC)
From: [identity profile] learnedax.livejournal.com
It doesn't matter to your point whether you meant FIFO where you wrote LIFO there, but I think you did. LIFO would have to be the most perverse page replacement system, since you would only use one frame.

You don't mention the clock/second-chance algorithm (described here), which is common, pretty efficient, and pretty easy to implement. Possibly useful here.

On the other hand, it's usually a good idea to batch your page to disk write-outs, and if you're using an LRU system you can leverage it to know which pages to write out. Since the least recently touched are most likely to be overwritten soon, they're the key ones to clean up. Unfortunately, LRU is kind of a pain. A compromise system of some kind might be easier.
(deleted comment)

Date: 2005-12-08 04:51 pm (UTC)
From: [identity profile] learnedax.livejournal.com
Oh, yes, I agree with your main point, I sort of wandered afield into general commentary on the VM.

LIFO could be useful for testing, though it's still a lot of overhead vs. just giving yourself one frame to begin with. Granted, they amount to the same thing...

ow

Date: 2005-12-07 05:12 am (UTC)
From: [identity profile] hfcougar.livejournal.com
I followed what you were saying, but just barely. There are reasons I gave up on trying to become a programmer.

Re: ow

Date: 2005-12-07 03:56 pm (UTC)
From: [identity profile] lakshmi-amman.livejournal.com
Admittedly, this got jargon-heavy and assumed-concept heavy really fast. Thus the cut-tag label. :) If I were explaining this to someone who didn't do computers for a living, I'd find some neat analogy. And if it were a class, this info would be transmitted over the course of several weeks of several hour a week lectures. Not one big brain dump. :) After all, I just finished week 9 of a 10-week course. Here in week 10, I will be expected to cleverly show that I was awake for all 10 of those weeks.

Date: 2005-12-08 04:45 pm (UTC)
From: [identity profile] learnedax.livejournal.com
blah blah dirty thrill blah blah blah get partially naked blah blah blah took a big licking from me blah blah

I'm sorry, did you say something about operating systems?

(I don't usually think of the NPE in Java as *quite* that bad. It's definitely not a good thing, but it's not an uncommon occurence even in professional Java software. You are never supposed to get in a situation where they get thrown routinely and you ignore them, but sometimes it happens.)

February 2021

S M T W T F S
 123456
78910111213
14151617181920
212223 24252627
28      

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Apr. 28th, 2026 12:38 am
Powered by Dreamwidth Studios