The Release Day of GenoPro 2007
December 18, 2006
- Website deployment
- Hard drive failure
- File System Corrupted
- Response from Peer1
Dedicated Hosting
- Ordering a new server
- Transferring files from
old disks to the new server
- Remounting Databases on SQL Server
- Configuring MailEnable with IIS
- The Aftermath
The first version of GenoPro was released on June 25, 1998 and the launch day
for GenoPro version 2.0 was scheduled for December 15, 2006 at 7:00 am EST.
The release day of GenoPro 2.0 was anticipated by many, as the beta period of
GenoPro 2.0 lasted several years. GenoPro 2.0 had
20 major
betas plus 70 sub-betas and as many private builds for our testers.
On that day, I woke up very early to get
ready to deploy the new website for GenoPro 2007. We had a brand new website
with every HTML page redesigned from scratch for GenoPro 2007. Until that day,
GenoPro was known as GenoPro 2.0 and the name
GenoPro 2007 had been kept secret. Our goal for the day was to
deploy the new website and spend the day relaxing, reviewing the website to
correct typos and/or broken links.
After having deployed the website, we decided to test our new payment
system. My brother Jean-Claude made a successful purchase by PayPal, and it was
my turn to purchase GenoPro with my credit card. I clicked on the link to
purchase GenoPro 2007, however the
server responded very slowly. Meanwhile, I had lost my connection with the
remote terminal. I also noticed several POP3 errors with Outlook. All of the
sudden, the machine rebooted. After the reboot,
we looked in the Event Log to see the following error:

The event was "The device \Device\Scsi\aarich1 did not respond within the
timeout period". This message is a sign of a hard drive failure, unable to read
or write a file. I ran a check disk (chkdsk.exe) and got several errors on the
file system:
C:\Documents and Settings\Daniel>chkdsk
The type of the file system is NTFS.
The volume is in use by another process. Chkdsk
might report errors when no corruption is present.
WARNING! F parameter not specified.
Running CHKDSK in read-only mode.
CHKDSK is verifying files (stage 1 of 3)...
Deleted corrupt attribute list entry
with type code 128 in file 166911.
Deleting corrupt attribute record (128, "")
from file record segment 140978.
Deleting corrupt attribute record (128, "")
from file record segment 150391.
Deleting corrupt attribute record (128, "")
from file record segment 153381.
Deleting corrupt attribute record (128, "")
from file record segment 165148.
Deleting corrupt attribute record (128, "")
from file record segment 180689.
Deleting corrupt attribute record (128, "")
from file record segment 289048.
Deleting corrupt attribute record (128, "")
from file record segment 340304.
Deleting corrupt attribute record (128, "")
from file record segment 396055.
Deleting corrupt attribute record (128, "")
from file record segment 543420.
File verification completed.
Deleting orphan file record segment 140978.
Deleting orphan file record segment 150391.
Deleting orphan file record segment 153381.
Deleting orphan file record segment 165148.
Deleting orphan file record segment 180689.
Deleting orphan file record segment 289048.
Deleting orphan file record segment 340304.
Deleting orphan file record segment 396055.
Deleting orphan file record segment 543420.
Errors found. CHKDSK cannot continue in read-only mode.
Every time I would run chkdsk, I would get different errors. Of course, I
ran chkdsk using the /F option to reboot the machine, but I was still getting
file system errors.
C:\Documents and Settings\Daniel>chkdsk
The type of the file system is NTFS.
The volume is in use by another process. Chkdsk
might report errors when no corruption is present.
WARNING! F parameter not specified.
Running CHKDSK in read-only mode.
CHKDSK is verifying files (stage 1 of 3)...
File verification completed.
CHKDSK is verifying indexes (stage 2 of 3)...
Deleting index entry PID2592.TMP in index $I30 of file 22861.
Deleting index entry APISCR~1.LNK in index $I30 of file 30734.
Deleting index entry log.txt.lnk in index $I30 of file 30734.
Deleting index entry LOGTXT~1.LNK in index $I30 of file 30734.
Deleting index entry MSHist012006121420061215 in index $I30 of file 30747.
Deleting index entry MSHIST~4 in index $I30 of file 30747.
Deleting index entry Dc14436.txt in index $I30 of file 33957.
Deleting index entry _7639.NEW in index $I30 of file 68902.
Deleting index entry E5625224C2264A509139B1A08DC34201.MAI in index $I30 of file 68927.
Deleting index entry E56252~1.MAI in index $I30 of file 68927.
Deleting index entry A20D7148DA574F65A520138AB3A853F4.MAI in index $I30 of file 93605.
Deleting index entry A20D71~1.MAI in index $I30 of file 93605.
Deleting index entry _198363.NEW in index $I30 of file 153023.
Deleting index entry _5811.NEW in index $I30 of file 174196.
Deleting index entry 6881CAEBDDB24375BB619F19630F039C.MAI in index $I30 of file 185580.
Deleting index entry 6881CA~1.MAI in index $I30 of file 185580.
Deleting index entry 974B2BBEE95C498891BC727DB56229A5.MAI in index $I30 of file 185580.
Deleting index entry 974B2B~1.MAI in index $I30 of file 185580.
Deleting index entry EB997F92A97F4BF69A8252E609426FDB.MAI in index $I30 of file 185580.
Deleting index entry EB997F~1.MAI in index $I30 of file 185580.
Deleting index entry _15192.NEW in index $I30 of file 185901.
Index verification completed.
Errors found. CHKDSK cannot continue in read-only mode.
After a reboot, we got the following errors in the Event Log:

This time the error was from the Adaptec Storage Manager complaining about a
missing drive. Our server was running in a RAID 1 configuration, having two
identical drives. One drive was the backup of the other drive in case of a
failure. For the first time, the Adaptec Storage Manager was showing an error

We rebooted the machine with a check disk
and got another error. This is time, it was the file system NTFS structure
corrupted and unusable.
At this point, I called the technical
support of Peer1. I have been doing business with
Peer1 for over 5 years, and they always had great service with a short
turnaround time. I told the technical representative about the problem
suggesting a hardware problem.
A few minutes later, Peer1 called me back suggesting to replace the defective
drive. Of course, changing a defective drive would require a server shutdown.
Since this machine was running in a RAID 1 configuration, it would take several
hours to copy the data from the old drive to the new drive. What I like about
Peer1 is they are quick to resolve problems and never billed me any extra,
whatever it is the request for a hard reboot when the machine freezes, or clean
a virus, or replace defective hardware.
I was hoping the disk replacement would solve the problems. For sure, a
bad disk is can give "timeouts" and
and be responsible for file system errors. Around 2:00 pm, I got a phone call that
the drive had been replaced and about 75% of the data was copied. It would take
another hour before the machine could be operational. Once the machine was
online, I ran the Adaptec Storage Manager and noticed the device was performing
a disk verification. I decided to not do anything until the operation was
complete. The disk verification took a good two hours.
By now, it was late in the after noon, around 4:30 pm. I ran a chkdsk and got
more file system errors, including file security corruption. I am not sure if
the file security corruption had anything to do with our database, but our SQL
database was unable to start. The service SQL server was reporting an "Access
Denied" to the master database.
We tried many options to recover the database, including re-installing SQL
server. We had no success. We would get disk timeouts and many errors. We had
the Adaptec Storage Manager opened and noticed both drives were inaccessible.
We took a screenshot because we had the feeling we may never see this error
again. Indeed.

A few minutes later, we lost connection with the machine. The server had
crashed. I called Peer1 technical support requesting a hard reboot for the
server (this is done by "pulling the plug" and restarting the machine).
At this time, it was already 7:00 pm and we were starving since we had little
food during lunch. I decided to take my brother out to the restaurant. After
all, it was his birthday.
While waiting for our food to arrive, we were talking about having a new
machine. We had this in mind for several months, and the file system errors
made us wanted to format the drive and re-install everything.
While eating, I got a phone call from Peer1 technical support informing me
they were unable to reboot the machine. Somehow the disk controller on the
motherboard was defective and they could not reboot the machine. The technical
support told me the had no extra motherboards for my type of server. He told me
they could take my order and build one for me overnight. We quickly finished
our meals and headed home to see the packages at
http://www.dedicatedhosting.com/hosting/
Peer1 could give us the Basic hosting plan at $199 per month which was
cheaper than what I was currently paying. This basic hosting plan was already
superior to our current hosting plan, however I wanted a hot-swappable RAID 1
configuration. As a result, I took the Professional hosting plan with a dual
hyper-threaded processor. This hosting plan is a few extra dollars per
month, however it is
far more powerful and reliable since a defective disk can be changed while the
machine is running.
On Friday at 8:50 pm, I was on the phone with Peer1 giving the specifications
such as hard disk partition and software installed for the new server. The
sales representative put me on hold to make calls to confirm they had all the
hardware components to build the new machine. A few minutes later, he told me
the new machine will be hosted in Miami (Florida) instead of Atlanta (Georgia)
because they had the components there. I asked if I would get the same service
regarding bandwidth quality and technical support as I used to get in Atlanta.
He told me the service will be the same, as most of the technical support is
done in Atlanta anyways.
By the way, Peer1 has really good reliable bandwidth.
Sometimes we take those things for granted, but when a server is missing pings
reaching 50% losses on a daily basis, it is a real problem. Before switching Peer1,
GenoPro was hosted with HiSpeed.net. I had to call every week the technical
guy begging to reboot his switch(es) because my server was inaccessible.
HiSpeed.net would always come with some ridicule story, but strange
enough, 5 minutes after my call, the server was now accessible, yet the
machine hosting GenoPro never rebooted. Clearly it was a problem
with the switch relaying the information from the server to the
Internet. Also, HiSpeed.net billed me for 22,000 GB of transfer for my first month. HiSpeed.net was selling bandwidth for $10 per
GB, making my first invoice at the comprehensive price of $220,399. I never had the money
to pay this. At that time, I was living with my sister to save on housing
costs. Previously, I was paying $19.99 per month for hosting
GenoPro.com and was asked to find another ISP because GenoPro had too
many visitors. I tried to reason to the guy at HiSpeed.net,
telling him it was impossible to have that
much transfer for a website like GenoPro.com. I had compiled the logs from IIS
plus an external counter proving the number of visitors was about the same as
the previous month, and the previous month had about 2 GB of transfer.
Instead of using common sense, HiSpeed.net offered the solution to make
multiple payments to pay off the invoice and mentioned a possibility of
negotiating the amount. It took about 3 months to get this matter resolved, after
the moron admitted
he mis-configured his switch. A month later, I cancelled my contract
with HiSpeed.net because I was fed up with downtime. The guy from
HiSpeed.net told me I was not allowed to cancel since I had signed a 12-month contract.
I told him he did not respected his contract because my server was always
unreachable. I had many charts to prove countless repeated long period of
several hours of downtime. I suggested him
to sue me to have me pay the remaining months. Of course, I had already switched to
another ISP before
giving HiSpeed.net the finger! |
The sales representative from Peer1 told me the machine would be ready for
Saturday morning. I asked about the data from the old disks. The sales
representative told me they could send the hard disks by mail from Atlanta to
Miami but it would take a few days. I told him not to worry; I will call the
technical support and ask them to mount those drive on another machine and give
me a remote access.
On Saturday morning at 6:00 am, I called the technical support to know how to
login to the new machine. They told me they were finishing the configuration
and the server would be ready within less than one hour. 15 minutes later, I
got a call the machine was ready. After looking at the new server, I realized I
really needed to access the data from the previous disks. We had backups of our
websites and databases, but we were missing many other things such as email
accounts, mail messages received during the day, SSL encryption certificates,
and many other things. Besides, we had nearly one million files to transfer
from our backups. The website
http://familytrees.genopro.com
alone has over 500,000 files. Using our modem-cable Internet connection, it
would take several days to upload those files to the server. The machine
is also hosting other websites, including my mom & sister's baby store (http://www.merehelene.com/)
and the scrapbooking store of my aunts Claire, Sue and cousin Micheline (http://www.scrapbookerie.com/).
I called Peer1 technical support (in Atlanta) for a special request to have
access the data from the old machine. The guy told me his team will try to mount the drives
on another computer and give me a password for that machine. While at it,
I asked why the new server had a network link speed at 10 Mbps Half-Duplex,
instead of 100 Mbps Full-Duplex. The technical representative told me to wait,
as he will call in Miami to have an answer. A few minutes later, he told me
the network speed had been set to 100 Mbps Full-Duplex, but the machine would be offline for
about 5 minutes. (I have no idea why it requires 5 minutes downtime to
change a link speed, however I can tell you
the new server downloaded SQL Server 2005 from Microsoft.com at full 100 Mbps.
It took less than 3 minutes to download the whole 230 MB install package)
A few hours later, I got a phone call the machine was ready, warning me to be
careful because the file system on the old drives was corrupted and unstable. I
started copying the most important files in case the machine dies. Since were
transferring files by FTP, we had to compress them into a big .rar file to keep
the timestamps. The FTP protocol is slow transferring numerous files and does
not keep the timestamps. After the transfer, I took a last screenshot of the
old machine.

As you can see, both drives from the RAID configuration are available, but
this time they are un-RAIDed, and therefore accessible under C and D. When
Peer1 rebuilt a new server, they put both,drives each having having 99.6 GB of free space,
however we compressed almost 9 GB of stuff on the C drive before transferring it
to the new server. Since we had no problems with drive C, we kept drive D
intact as the original backup.
It is Monday and our websites are online The old server is currently idle,
yet accessible and stable. Being curious, I asked the technical
representative if they really changed the drive and he looked at the internal
notes and confirmed the drive had been changed. I think the disk
change was unnecessary since the problem was related to the controller on the
motherboard, but at that time, the problem appeared as a disk failure.
Nevertheless our filesystem had been corrupted and we needed a re-installing.
Why not get a new machine if we have to re-install everything.
I feel very pleased about the support I got from
Peer1.com. Usually, a big company has
poor service but this it not the case with Peer1. I got better quality and
quantity of technical support than I expected. A few weeks ago, the
technical support team helped me with a virus problem, something that had
nothing to do with their core business which is web hosting. I feel
confident doing business with Peer1.
By the way, I took a few screenshots of the new machine. You can see
there are 4 CPUs in the Task Manager and 2 network cards. We plan to use
the second network card (1 Gbps) to connect to a server dedicated for the SQL
database.

The sever crash was a great timing with the release day of GenoPro 2007.
Obviously, reconfiguring a new server was not my plans for the weekend, but in
the end, everything is fine. During the few hours GenoPro.com was
available on Friday, we received emails from people wishing to purchase GenoPro
2007 and from teachers wishing to apply to our
academic program for GenoPro 2007.
This is encouraging because GenoPro 2007 had never been announced before and was
online for only a few hours during that day. Before that day, GenoPro 2007
was known as GenoPro 2.0.
We plan to start working on GenoPro 2008 soon. Our goal for this spring
is creating a multi-lingual version of GenoPro.
Update, December 23, 2006
It has been a full week since we got our new server. During the whole
week, including today, we have been fixing missing shortcuts, hardcoded IP
addresses, paths, passwords, and other misc configuration settings forgotten
over the years. The
80-20 percent rule applies
when configuring a web server.
|